The patch titled
Subject: hugetlbfs: dirty pages as they are added to pagecache
has been added to the -mm tree. Its filename is
hugetlbfs-dirty-pages-as-they-are-added-to-pagecache.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/hugetlbfs-dirty-pages-as-they-are-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/hugetlbfs-dirty-pages-as-they-are-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlbfs: dirty pages as they are added to pagecache
Some test systems were experiencing negative huge page reserve counts and
incorrect file block counts. This was traced to /proc/sys/vm/drop_caches
removing clean pages from hugetlbfs file pagecaches. When non-hugetlbfs
explicit code removes the pages, the appropriate accounting is not
performed.
This can be recreated as follows:
fallocate -l 2M /dev/hugepages/foo
echo 1 > /proc/sys/vm/drop_caches
fallocate -l 2M /dev/hugepages/foo
grep -i huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 2048
HugePages_Free: 2047
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 4194304 kB
ls -lsh /dev/hugepages/foo
4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
To address this issue, dirty pages as they are added to pagecache. This
can easily be reproduced with fallocate as shown above. Read faulted
pages will eventually end up being marked dirty. But there is a window
where they are clean and could be impacted by code such as drop_caches.
So, just dirty them all as they are added to the pagecache.
In addition, it makes little sense to even try to drop hugetlbfs pagecache
pages, so disable calls to these filesystems in drop_caches code.
Link: http://lkml.kernel.org/r/20181018041022.4529-1-mike.kravetz@oracle.com
Fixes: 70c3547e36f5 ("hugetlbfs: add hugetlbfs_fallocate()")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/fs/drop_caches.c~hugetlbfs-dirty-pages-as-they-are-added-to-pagecache
+++ a/fs/drop_caches.c
@@ -9,6 +9,7 @@
#include <linux/writeback.h>
#include <linux/sysctl.h>
#include <linux/gfp.h>
+#include <linux/magic.h>
#include "internal.h"
/* A global variable is a bit ugly, but it keeps the code simple */
@@ -18,6 +19,12 @@ static void drop_pagecache_sb(struct sup
{
struct inode *inode, *toput_inode = NULL;
+ /*
+ * It makes no sense to try and drop hugetlbfs page cache pages.
+ */
+ if (sb->s_magic == HUGETLBFS_MAGIC)
+ return;
+
spin_lock(&sb->s_inode_list_lock);
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
spin_lock(&inode->i_lock);
--- a/mm/hugetlb.c~hugetlbfs-dirty-pages-as-they-are-added-to-pagecache
+++ a/mm/hugetlb.c
@@ -3690,6 +3690,12 @@ int huge_add_to_page_cache(struct page *
return err;
ClearPagePrivate(page);
+ /*
+ * set page dirty so that it will not be removed from cache/file
+ * by non-hugetlbfs specific code paths.
+ */
+ set_page_dirty(page);
+
spin_lock(&inode->i_lock);
inode->i_blocks += blocks_per_huge_page(h);
spin_unlock(&inode->i_lock);
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlbfs-dirty-pages-as-they-are-added-to-pagecache.patch
On Thu, 18 Oct 2018 11:23:12 PDT (-0700), merker(a)debian.org wrote:
> On Thu, Oct 18, 2018 at 11:13:02AM +0200, gregkh(a)linuxfoundation.org wrote:
>>
>> This is a note to let you know that I've just added the patch titled
>>
>> RISC-V: include linux/ftrace.h in asm-prototypes.h
>>
>> to the 4.4-stable tree which can be found at:
>> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>>
>> The filename of the patch is:
>> risc-v-include-linux-ftrace.h-in-asm-prototypes.h.patch
>> and it can be found in the queue-4.4 subdirectory.
>>
>> If you, or anyone else, feels it should not be added to the stable tree,
>> please let <stable(a)vger.kernel.org> know about it.
> [...]
>> From: James Cowgill <jcowgill(a)debian.org>
>> Date: Thu, 6 Sep 2018 22:57:56 +0100
>> Subject: RISC-V: include linux/ftrace.h in asm-prototypes.h
>>
>> From: James Cowgill <jcowgill(a)debian.org>
>>
>> [ Upstream commit 57a489786de9ec37d6e25ef1305dc337047f0236 ]
>
> I guess it doesn't make much sense to add this patch to the 4.4
> and 3.18 stable trees. The patch creates an arch-specific header
> (arch/riscv/include/asm/asm-prototypes.h), but the first mainline
> kernel with support for the RISC-V architecture has been kernel
> 4.15.
I agree.
Recently Wang Jian reported some KVP issues on the v4.4 kernel:
https://github.com/LIS/lis-next/issues/593:
e.g. the /var/lib/hyperv/.kvp_pool_* files can not be updated, and
sometimes if the hv_kvp_daemon doesn't timely start, the host may not
be able to query the VM's IP address via KVP.
I identified these 4 mainline patches to fix the issues. The patches
can be applied cleanly to the latest 4.4.y branch (currently it's
v4.4.161).
The first 3 are simply cherry-picked from the mainline, and the 4th
has to be reworked for the v4.4 kernel.
Wang Jian tested the 4 patches, and the issues can be fixed.
I also did some tests and found no regression.
Thanks!
-- Dexuan
K. Y. Srinivasan (2):
Drivers: hv: utils: Invoke the poll function after handshake
Drivers: hv: util: Pass the channel information during the init call
Long Li (1): -- Reworked by Dexuan
HV: properly delay KVP packets when negotiation is in progress
Vitaly Kuznetsov (1):
Drivers: hv: kvp: fix IP Failover
drivers/hv/hv_fcopy.c | 2 +-
drivers/hv/hv_kvp.c | 40 +++++++++++++++++++++++++++++++++++++---
drivers/hv/hv_snapshot.c | 4 ++--
drivers/hv/hv_util.c | 1 +
drivers/hv/hyperv_vmbus.h | 5 +++++
include/linux/hyperv.h | 1 +
6 files changed, 47 insertions(+), 6 deletions(-)
--
2.7.4
The host may send multiple negotiation packets
(due to timeout) before the KVP user-mode daemon
is connected. KVP user-mode daemon is connected.
We need to defer processing those packets
until the daemon is negotiated and connected.
It's okay for guest to respond
to all negotiation packets.
In addition, the host may send multiple staged
KVP requests as soon as negotiation is done.
We need to properly process those packets using one
tasklet for exclusive access to ring buffer.
This patch is based on the work of
Nick Meier <Nick.Meier(a)microsoft.com>.
Signed-off-by: Long Li <longli(a)microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
The above is the original changelog of
a3ade8cc474d ("HV: properly delay KVP packets when negotiation is in progress"
Here I re-worked the original patch because the mainline version
can't work for the linux-4.4.y branch, on which channel->callback_event
doesn't exist yet. In the mainline, channel->callback_event was added by:
631e63a9f346 ("vmbus: change to per channel tasklet"). Here we don't want
to backport it to v4.4, as it requires extra supporting changes and fixes,
which are unnecessary as to the KVP bug we're trying to resolve.
NOTE: before this patch is used, we should cherry-pick the other related
3 patches from the mainline first:
The background of this backport request is that: recently Wang Jian reported
some KVP issues: https://github.com/LIS/lis-next/issues/593:
e.g. the /var/lib/hyperv/.kvp_pool_* files can not be updated, and sometimes
if the hv_kvp_daemon doesn't timely start, the host may not be able to query
the VM's IP address via KVP.
Reported-by: Wang Jian <jianjian.wang1(a)gmail.com>
Tested-by: Wang Jian <jianjian.wang1(a)gmail.com>
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
---
This is re-worked by me from the mainline:
a3ade8cc474d ("HV: properly delay KVP packets when negotiation is in progress"
I added my Signed-off-by as I identified and tested the patches.
If this is unnecessary, please feel free to remove it.
drivers/hv/hv_kvp.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index ff0a426..1771a96 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -612,21 +612,22 @@ void hv_kvp_onchannelcallback(void *context)
NEGO_IN_PROGRESS,
NEGO_FINISHED} host_negotiatied = NEGO_NOT_STARTED;
- if (host_negotiatied == NEGO_NOT_STARTED &&
- kvp_transaction.state < HVUTIL_READY) {
+ if (kvp_transaction.state < HVUTIL_READY) {
/*
* If userspace daemon is not connected and host is asking
* us to negotiate we need to delay to not lose messages.
* This is important for Failover IP setting.
*/
- host_negotiatied = NEGO_IN_PROGRESS;
- schedule_delayed_work(&kvp_host_handshake_work,
+ if (host_negotiatied == NEGO_NOT_STARTED) {
+ host_negotiatied = NEGO_IN_PROGRESS;
+ schedule_delayed_work(&kvp_host_handshake_work,
HV_UTIL_NEGO_TIMEOUT * HZ);
+ }
return;
}
if (kvp_transaction.state > HVUTIL_READY)
return;
-
+recheck:
vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 4, &recvlen,
&requestid);
@@ -703,6 +704,8 @@ void hv_kvp_onchannelcallback(void *context)
VM_PKT_DATA_INBAND, 0);
host_negotiatied = NEGO_FINISHED;
+
+ goto recheck;
}
}
--
2.7.4
Hi Greg,
This series fixes issues we've seen with softirq time accounting in 4.9:
- when ksoftirqd is running at 100% on a CPU, none of the values
reported by /proc/stat for that CPU will change, sometimes for
dozens of seconds,
- large deviations in the total number of ticks accumulated over a
fixed time for a CPU, probably because of the first issue hitting
for shorter periods.
We found out that something pretty similar had been reported 9 months
ago, see the reference link below. In that discussion, Rabin Vincent had
made a 4.9 specific patch which fixes our first issue, but we were still
seeing some deviation from the total number of ticks (up to 1.7% from
expected, where we had only 0.2% on older kernels), and you had also
asked for a direct backport from the mainline series, if possible.
As mentioned in that thread, a lot of changes (probably 50+) went into
4.11 to remove cputime, but we could get something working with only the
4 attached patches to fix these two issues. Three of these patches apply
without change, and the second one in the series ("sched/cputime:
Convert kcpustat to nsecs") needed a minor change as a cast had been
added in 527b0a76f41d ("sched/cpuacct: Avoid %lld seq_printf warning")
to fix a build warning on s390. I guess we could also include that patch
in this series, let me know if this is the preferred way to handle this.
We ran our tests on 3.18, 4.4 and 4.9 and confirmed that only 4.9 would
need this series, and that this series indeed restores the behavior we
were seeing on those older kernels.
Thanks!
Reference: http://lkml.kernel.org/r/%3C1513159876-5125-1-git-send-email-rabin.vincent@…
v2: - drop "time: Introduce jiffies64_to_nsecs()" as it has already been
merged into v4.9.132,
- include backport of commit 564b733c899f ("macintosh/rack-meter:
Convert cputime64_t use to u64") to avoid introducing a build
failure on powerpc.
Frederic Weisbecker (4):
sched/cputime: Convert kcpustat to nsecs
macintosh/rack-meter: Convert cputime64_t use to u64
sched/cputime: Increment kcpustat directly on irqtime account
sched/cputime: Fix ksoftirqd cputime accounting regression
arch/s390/appldata/appldata_os.c | 16 +++----
drivers/cpufreq/cpufreq.c | 6 +--
drivers/cpufreq/cpufreq_governor.c | 2 +-
drivers/cpufreq/cpufreq_stats.c | 1 -
drivers/macintosh/rack-meter.c | 28 +++++------
fs/proc/stat.c | 68 +++++++++++++--------------
fs/proc/uptime.c | 7 +--
kernel/sched/cpuacct.c | 2 +-
kernel/sched/cputime.c | 75 +++++++++++++-----------------
kernel/sched/sched.h | 12 +++--
10 files changed, 104 insertions(+), 113 deletions(-)
--
2.19.1