linaro-kernel June 2013

linaro-kernel@lists.linaro.org

89 participants
86 discussions

[PATCH 00/16] Support for per policy instance of Interactive governor

by Viresh Kumar

Hi Todd and others, If we have a multi-package system, where we have multiple instances of struct policy (per package), currently we can't have multiple instances of same governor. i.e. We can't have multiple instances of Interactive governor for multiple packages. This is a bottleneck for multicluster system, where we want different packages to use Interactive governor, but with different tunables. ---------x------------x--------- Recently, I have upstreamed this support in 3.10-rc1 for cpufreq core, Ondemand and Conservative governor. Now is an attempt for Interactive Governor. I didn't had any clue on what kernel to rebase my patches over as I couldn't find a 3.10-rc based branch in your tree and so based it on experimental/android-3.9. So, this is what this patchset does: - Backports some important patches from v3.10-rc1/2 to v3.9: First 8 patches - Added few more supportive patches which might go in rc3: Next 4 patches - Finally updated Interactive governor: Last 4 patches So, Review is probably required only for last 4 patches. The last patch is a bit long, it is mostly rearrangement of the code rather then major update. It is based on the patchset which I wrote for Ondemand/Conservative governor. This has been tested on ARM big LITTLE platform which has multiple packages requiring separate tunables. Nathan Zimmer (1): cpufreq: Convert the cpufreq_driver_lock to a rwlock Stratos Karafotis (1): cpufreq: governors: Calculate iowait time only when necessary Viresh Kumar (14): cpufreq: Add per policy governor-init/exit infrastructure cpufreq: governor: Implement per policy instances of governors cpufreq: Call __cpufreq_governor() with correct policy->cpus mask cpufreq: Don't call __cpufreq_governor() for drivers without target() cpufreq: governors: Fix CPUFREQ_GOV_POLICY_{INIT|EXIT} notifiers cpufreq: Issue CPUFREQ_GOV_POLICY_EXIT notifier before dropping policy refcount cpufreq: Add EXPORT_SYMBOL_GPL for have_governor_per_policy cpufreq: governors: Move get_governor_parent_kobj() to cpufreq.c cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT cpufreq: Move get_cpu_idle_time() to cpufreq.c cpufreq: interactive: Use generic get_cpu_idle_time() from cpufreq.c cpufreq: interactive: Remove unnecessary cpu_online() check cpufreq: interactive: Move definition of cpufreq_gov_interactive downwards cpufreq: Interactive: Implement per policy instances of governor drivers/cpufreq/cpufreq.c | 157 ++++++-- drivers/cpufreq/cpufreq_conservative.c | 195 ++++++---- drivers/cpufreq/cpufreq_governor.c | 273 +++++++------- drivers/cpufreq/cpufreq_governor.h | 120 +++++- drivers/cpufreq/cpufreq_interactive.c | 663 +++++++++++++++++++-------------- drivers/cpufreq/cpufreq_ondemand.c | 274 ++++++++------ include/linux/cpufreq.h | 19 +- 7 files changed, 1043 insertions(+), 658 deletions(-) -- 1.7.12.rc2.18.g61b472e

12 years

[PATCH 0/3] sched: Sched Domains: Fixups - Part 3

by Viresh Kumar

Hi Peter/Ingo, This set contains few more minor fixes that I could find for code responsible for creating sched domains. They are rebased of my earlier fixes: Part 1: https://lkml.org/lkml/2013/6/4/253 Part 2: https://lkml.org/lkml/2013/6/10/141 They should be applied in this order to avoid conflicts. My study of "How scheduling domains are created" is almost over now and so probably this is my last patchset for fixes related to scheduling domains. Sorry for three separate sets, I sent them as soon as I had few of them sitting in my tree. Viresh Kumar (3): sched: Use cached value of span instead of calling sched_domain_span() sched: don't call get_group() for covered cpus sched: remove WARN_ON(!sd) from init_sched_groups_power() kernel/sched/core.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) -- 1.7.12.rc2.18.g61b472e

12 years

[PATCH] arm64: Define PTE_TYPE_HUGEPAGE

by Christoffer Dall

PTE_TYPE_HUGEPAGE is referenced by pte_huge, but because nobody uses this macro it doesn't fail yet. KVM will be using this when 32-bit support for huge pages is added so add it. Signed-off-by: Christoffer Dall <christoffer.dall(a)linaro.org> --- arch/arm64/include/asm/pgtable-hwdef.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 66367c6..0219523 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -54,6 +54,7 @@ #define PTE_TYPE_MASK (_AT(pteval_t, 3) << 0) #define PTE_TYPE_FAULT (_AT(pteval_t, 0) << 0) #define PTE_TYPE_PAGE (_AT(pteval_t, 3) << 0) +#define PTE_TYPE_HUGEPAGE (_AT(pteval_t, 1) << 0) #define PTE_USER (_AT(pteval_t, 1) << 6) /* AP[1] */ #define PTE_RDONLY (_AT(pteval_t, 1) << 7) /* AP[2] */ #define PTE_SHARED (_AT(pteval_t, 3) << 8) /* SH[1:0], inner shareable */ -- 1.8.1.2

12 years

Fwd: Re: Update on LP1097213

by Mathieu Poirier

Good day Jon, Please include the included patch in your tree. It is a fix for [1]. Thanks, Mathieu. [1]. https://bugs.launchpad.net/linaro-big-little-system/+bug/1097213 -------- Original Message -------- Subject: Re: Update on LP1097213 Date: Mon, 17 Jun 2013 16:31:47 +0100 From: Morten Rasmussen <morten.rasmussen(a)arm.com> To: Mathieu Poirier <mathieu.poirier(a)linaro.org> CC: Vincent Guittot <vincent.guittot(a)linaro.org>, Serge Broslavsky <serge.broslavsky(a)linaro.org>, Amit Kucheria <amit.kucheria(a)linaro.org>, Nicolas Pitre <nicolas.pitre(a)linaro.org>, Naresh Kamboju <naresh.kamboju(a)linaro.org> Hi Mathieu, I had a quick look at the hmp_next_{up,down}_delay() stuff. It is all introduced in the patch: "sched: SCHED_HMP multi-domain task migration control". Reverting it requires some manual conflict fixing and you will also need to remove the extra hmp_next_down_delay() added by a later patch. I've attached a revert patch for debugging purposes that should do it all. I'm not sure if this will just remove the symptom or if the sched_clock accesses are the true cause of the problem. I hope it helps, Morten On 17/06/13 14:26, Vincent Guittot wrote: > Mathieu, > > Please find below the mail we have discussed during the call > > Vincent > > On 14 June 2013 15:21, Vincent Guittot <vincent.guittot(a)linaro.org> wrote: >> On 14 June 2013 15:14, Vincent Guittot <vincent.guittot(a)linaro.org> wrote: >>> On 14 June 2013 14:39, Mathieu Poirier <mathieu.poirier(a)linaro.org> wrote: >>>> Anything on this ?!? Morten, Vincent ? >>> >>> Hi Mathieu, >>> >>> I haven't noticed that the problem can be reproduced on a snowball, >>> the 1st time i read your email. >>> It's means that the hmp specific function are also called on smp system ? >>> >>> I'm going to look more ddeplyin the code >>> >> >> for_each_online_cpu is used in hmp_force_up_migration but it's not >> protected against hotplug so it can used a cpu that is going to be >> unplugged >> >> We should probably protect the sequence with get/put_online_cpus >> >> Vincent >> >>> Vincent >>> >>>> >>>> On 13-06-12 03:13 PM, Mathieu Poirier wrote: >>>>> Good day gents, >>>>> >>>>> I have been working on [1] for a while now, on and off as time >>>>> permitted. The problem has always been very elusive but definitely >>>>> present. As some of the notes in the bug report indicate TC2 wasn't the >>>>> only ARM system I could reproduce this on - snowball suffered from the >>>>> exact same problem. >>>>> >>>>> I started looking at this again for 3.10 and I have good and bad news. >>>>> >>>>> The good news is that I can't reproduce the problem anymore if >>>>> CONFIG_SCHED_HMP is not enabled. I ran the attached script for more >>>>> than 16 hours without even the hint of a problem. Normally one would >>>>> get a crash [2] in less than a minute. I won't go so far as claiming >>>>> that upstream solved the problem. Maybe we are lucky and timing in 3.10 >>>>> simply doesn't allow for the fault to occur. In any case, all we can do >>>>> is continue monitoring the situation in upcoming versions. >>>>> >>>>> On the flip side we have a definite problem with hotplug when >>>>> CONFIG_SCHED_HMP is defined. The crash in [2] is consistent and can be >>>>> reproduced at will. Looking at the trace the problem happens in >>>>> 'select_task_rq_fair' where calls to 'hmp_next_up_delay' and >>>>> 'hmp_next_down_delay' end up referencing 'cfs_rq_clock_task' where >>>>> cfs-rq->rq point to a bogus address. >>>>> >>>>> Have a look at line 9 in [2] - this is a little bit of instrumentation I >>>>> started working on. It basically outputs the new and previous CPUs in >>>>> 'hmp_[up,down]_migration' conditional statements along with the >>>>> direction of the migration [3]. In every instances the system was going >>>>> from the A15 to the A7 cluster. I haven't found a single instance where >>>>> the opposite was be true. >>>>> >>>>> Since this is directly related to our efforts to make the scheduler >>>>> power aware and based on Ingo's latest rebuttal, I am not sure that it >>>>> wise for me to continue working on this - specifically if we end up >>>>> scrapping that portion of the code. I'm eager to hear your opinion. >>>>> >>>>> On the flip side it highlights (once again) that we need to invest >>>>> massively in the hotplug subsystem, more specifically in its relation to >>>>> the scheduler and the RCU subsystem. >>>>> >>>>> Mathieu. >>>>> >>>>> PS. I have purposely kept the audience to a minimum - forward as you >>>>> see fit. >>>>> >>>>> [1]. https://bugs.launchpad.net/linaro-big-little-system/+bug/1188778 >>>>> [2]. https://pastebin.linaro.org/view/0751c84b >>>>> [3]. https://pastebin.linaro.org/view/4491ee27 >>>>> >>>> > -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

12 years

[PATCH] cpufreq: userspace: Simplify governor

by Viresh Kumar

Userspace governor has got more code than what it needs for its functioning. Lets simplify it. Over that it will fix issues in cpufreq_governor_userspace(), which isn't doing right things in START/STOP. It is working per-cpu currently whereas it just required to manage policy->cpu. Reported-by: Xiaoguang Chen <chenxg.marvell(a)gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org> --- @Rafael: I don't know why this code was initially added. Please let me know if I am doing something stupid. Also, please apply it as a fix for 3.10 as it is broken recently in 3.9. drivers/cpufreq/cpufreq_userspace.c | 108 ++++-------------------------------- 1 file changed, 12 insertions(+), 96 deletions(-) diff --git a/drivers/cpufreq/cpufreq_userspace.c b/drivers/cpufreq/cpufreq_userspace.c index bbeb9c0..5dc77b7 100644 --- a/drivers/cpufreq/cpufreq_userspace.c +++ b/drivers/cpufreq/cpufreq_userspace.c @@ -13,55 +13,13 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt -#include <linux/kernel.h> -#include <linux/module.h> -#include <linux/smp.h> -#include <linux/init.h> -#include <linux/spinlock.h> -#include <linux/interrupt.h> #include <linux/cpufreq.h> -#include <linux/cpu.h> -#include <linux/types.h> -#include <linux/fs.h> -#include <linux/sysfs.h> +#include <linux/init.h> +#include <linux/module.h> #include <linux/mutex.h> -/** - * A few values needed by the userspace governor - */ -static DEFINE_PER_CPU(unsigned int, cpu_max_freq); -static DEFINE_PER_CPU(unsigned int, cpu_min_freq); -static DEFINE_PER_CPU(unsigned int, cpu_cur_freq); /* current CPU freq */ -static DEFINE_PER_CPU(unsigned int, cpu_set_freq); /* CPU freq desired by - userspace */ static DEFINE_PER_CPU(unsigned int, cpu_is_managed); - static DEFINE_MUTEX(userspace_mutex); -static int cpus_using_userspace_governor; - -/* keep track of frequency transitions */ -static int -userspace_cpufreq_notifier(struct notifier_block *nb, unsigned long val, - void *data) -{ - struct cpufreq_freqs *freq = data; - - if (!per_cpu(cpu_is_managed, freq->cpu)) - return 0; - - if (val == CPUFREQ_POSTCHANGE) { - pr_debug("saving cpu_cur_freq of cpu %u to be %u kHz\n", - freq->cpu, freq->new); - per_cpu(cpu_cur_freq, freq->cpu) = freq->new; - } - - return 0; -} - -static struct notifier_block userspace_cpufreq_notifier_block = { - .notifier_call = userspace_cpufreq_notifier -}; - /** * cpufreq_set - set the CPU frequency @@ -80,13 +38,6 @@ static int cpufreq_set(struct cpufreq_policy *policy, unsigned int freq) if (!per_cpu(cpu_is_managed, policy->cpu)) goto err; - per_cpu(cpu_set_freq, policy->cpu) = freq; - - if (freq < per_cpu(cpu_min_freq, policy->cpu)) - freq = per_cpu(cpu_min_freq, policy->cpu); - if (freq > per_cpu(cpu_max_freq, policy->cpu)) - freq = per_cpu(cpu_max_freq, policy->cpu); - /* * We're safe from concurrent calls to ->target() here * as we hold the userspace_mutex lock. If we were calling @@ -107,7 +58,7 @@ static int cpufreq_set(struct cpufreq_policy *policy, unsigned int freq) static ssize_t show_speed(struct cpufreq_policy *policy, char *buf) { - return sprintf(buf, "%u\n", per_cpu(cpu_cur_freq, policy->cpu)); + return sprintf(buf, "%u\n", policy->cur); } static int cpufreq_governor_userspace(struct cpufreq_policy *policy, @@ -119,66 +70,31 @@ static int cpufreq_governor_userspace(struct cpufreq_policy *policy, switch (event) { case CPUFREQ_GOV_START: BUG_ON(!policy->cur); - mutex_lock(&userspace_mutex); - - if (cpus_using_userspace_governor == 0) { - cpufreq_register_notifier( - &userspace_cpufreq_notifier_block, - CPUFREQ_TRANSITION_NOTIFIER); - } - cpus_using_userspace_governor++; + pr_debug("started managing cpu %u\n", cpu); + mutex_lock(&userspace_mutex); per_cpu(cpu_is_managed, cpu) = 1; - per_cpu(cpu_min_freq, cpu) = policy->min; - per_cpu(cpu_max_freq, cpu) = policy->max; - per_cpu(cpu_cur_freq, cpu) = policy->cur; - per_cpu(cpu_set_freq, cpu) = policy->cur; - pr_debug("managing cpu %u started " - "(%u - %u kHz, currently %u kHz)\n", - cpu, - per_cpu(cpu_min_freq, cpu), - per_cpu(cpu_max_freq, cpu), - per_cpu(cpu_cur_freq, cpu)); - mutex_unlock(&userspace_mutex); break; case CPUFREQ_GOV_STOP: - mutex_lock(&userspace_mutex); - cpus_using_userspace_governor--; - if (cpus_using_userspace_governor == 0) { - cpufreq_unregister_notifier( - &userspace_cpufreq_notifier_block, - CPUFREQ_TRANSITION_NOTIFIER); - } + pr_debug("managing cpu %u stopped\n", cpu); + mutex_lock(&userspace_mutex); per_cpu(cpu_is_managed, cpu) = 0; - per_cpu(cpu_min_freq, cpu) = 0; - per_cpu(cpu_max_freq, cpu) = 0; - per_cpu(cpu_set_freq, cpu) = 0; - pr_debug("managing cpu %u stopped\n", cpu); mutex_unlock(&userspace_mutex); break; case CPUFREQ_GOV_LIMITS: mutex_lock(&userspace_mutex); - pr_debug("limit event for cpu %u: %u - %u kHz, " - "currently %u kHz, last set to %u kHz\n", + pr_debug("limit event for cpu %u: %u - %u kHz, currently %u kHz\n", cpu, policy->min, policy->max, - per_cpu(cpu_cur_freq, cpu), - per_cpu(cpu_set_freq, cpu)); - if (policy->max < per_cpu(cpu_set_freq, cpu)) { + policy->cur); + + if (policy->max < policy->cur) __cpufreq_driver_target(policy, policy->max, CPUFREQ_RELATION_H); - } else if (policy->min > per_cpu(cpu_set_freq, cpu)) { + else if (policy->min > policy->cur) __cpufreq_driver_target(policy, policy->min, CPUFREQ_RELATION_L); - } else { - __cpufreq_driver_target(policy, - per_cpu(cpu_set_freq, cpu), - CPUFREQ_RELATION_L); - } - per_cpu(cpu_min_freq, cpu) = policy->min; - per_cpu(cpu_max_freq, cpu) = policy->max; - per_cpu(cpu_cur_freq, cpu) = policy->cur; mutex_unlock(&userspace_mutex); break; } -- 1.7.12.rc2.18.g61b472e

12 years

[RFC PATCH 1/2] KVM: ARM: Transparent huge pages and hugetlbfs support

by Christoffer Dall

From: Christoffer Dall <cdall(a)cs.columbia.edu> Support transparent huge pages in 32-bit KVM/ARM. The whole transparent_hugepage_adjust stuff is far from pretty, but this is how it's solved on x86 so we duplicate their logic. This should be shared across architectures if possible (like many other things), but can always be changed down the road. The pud_huge checking on the unmap path may feel a bit silly as the pud_huge check is always defined to false, but the compiler should be smart about this. Signed-off-by: Christoffer Dall <christoffer.dall(a)linaro.org> --- arch/arm/include/asm/kvm_host.h | 7 +- arch/arm/include/asm/kvm_mmu.h | 6 +- arch/arm/kvm/mmu.c | 158 +++++++++++++++++++++++++++++++++------- 3 files changed, 137 insertions(+), 34 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 1f3cee2..45a165e 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -33,10 +33,9 @@ #define KVM_VCPU_MAX_FEATURES 1 -/* We don't currently support large pages. */ -#define KVM_HPAGE_GFN_SHIFT(x) 0 -#define KVM_NR_PAGE_SIZES 1 -#define KVM_PAGES_PER_HPAGE(x) (1UL<<31) +#define KVM_HPAGE_GFN_SHIFT(_level) (((_level) - 1) * 21) +#define KVM_HPAGE_SIZE (1UL << KVM_HPAGE_GFN_SHIFT(2)) +#define KVM_PAGES_PER_HPAGE (KVM_HPAGE_SIZE / PAGE_SIZE) #include <kvm/arm_vgic.h> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 472ac70..9ef71b1 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -105,7 +105,8 @@ static inline void kvm_set_s2pte_writable(pte_t *pte) struct kvm; -static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn) +static inline void coherent_icache_guest_page(struct kvm *kvm, hva_t hva, + unsigned long size) { /* * If we are going to insert an instruction page and the icache is @@ -120,8 +121,7 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn) * need any kind of flushing (DDI 0406C.b - Page B3-1392). */ if (icache_is_pipt()) { - unsigned long hva = gfn_to_hva(kvm, gfn); - __cpuc_coherent_user_range(hva, hva + PAGE_SIZE); + __cpuc_coherent_user_range(hva, hva + size); } else if (!icache_is_vivt_asid_tagged()) { /* any kind of VIPT cache */ __flush_icache_all(); diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index ca6bea4..9170c98 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -19,6 +19,7 @@ #include <linux/mman.h> #include <linux/kvm_host.h> #include <linux/io.h> +#include <linux/hugetlb.h> #include <trace/events/kvm.h> #include <asm/pgalloc.h> #include <asm/cacheflush.h> @@ -87,19 +88,27 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc) static void clear_pud_entry(struct kvm *kvm, pud_t *pud, phys_addr_t addr) { - pmd_t *pmd_table = pmd_offset(pud, 0); - pud_clear(pud); - kvm_tlb_flush_vmid_ipa(kvm, addr); - pmd_free(NULL, pmd_table); + if (pud_huge(*pud)) { + pud_clear(pud); + } else { + pmd_t *pmd_table = pmd_offset(pud, 0); + pud_clear(pud); + kvm_tlb_flush_vmid_ipa(kvm, addr); + pmd_free(NULL, pmd_table); + } put_page(virt_to_page(pud)); } static void clear_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr) { - pte_t *pte_table = pte_offset_kernel(pmd, 0); - pmd_clear(pmd); - kvm_tlb_flush_vmid_ipa(kvm, addr); - pte_free_kernel(NULL, pte_table); + if (pmd_huge(*pmd)) { + pmd_clear(pmd); + } else { + pte_t *pte_table = pte_offset_kernel(pmd, 0); + pmd_clear(pmd); + kvm_tlb_flush_vmid_ipa(kvm, addr); + pte_free_kernel(NULL, pte_table); + } put_page(virt_to_page(pmd)); } @@ -142,12 +151,34 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp, continue; } + if (pud_huge(*pud)) { + /* + * If we are dealing with a huge pud, just clear it and + * move on. + */ + clear_pud_entry(kvm, pud, addr); + addr += PUD_SIZE; + continue; + } + pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) { addr += PMD_SIZE; continue; } + if (pmd_huge(*pmd)) { + /* + * If we are dealing with a huge pmd, just clear it and + * walk back up the ladder. + */ + clear_pmd_entry(kvm, pmd, addr); + if (pmd_empty(pmd)) + clear_pud_entry(kvm, pud, addr); + addr += PMD_SIZE; + continue; + } + pte = pte_offset_kernel(pmd, addr); clear_pte_entry(kvm, pte, addr); range = PAGE_SIZE; @@ -432,7 +463,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, { pgd_t *pgd; pud_t *pud; - pmd_t *pmd; + pmd_t *pmd, old_pmd; pte_t *pte, old_pte; /* Create 2nd stage page table mapping - Level 1 */ @@ -448,7 +479,22 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, pmd = pmd_offset(pud, addr); - /* Create 2nd stage page table mapping - Level 2 */ + /* Create 2nd stage section mappings (huge tlb pages) - Level 2 */ + if (pte_huge(*new_pte) || pmd_huge(*pmd)) { + pte_t *huge_pte = (pte_t *)pmd; + VM_BUG_ON(pmd_present(*pmd) && !pmd_huge(*pmd)); + + old_pmd = *pmd; + kvm_set_pte(huge_pte, *new_pte); /* new_pte really new_pmd */ + if (pmd_present(old_pmd)) + kvm_tlb_flush_vmid_ipa(kvm, addr); + else + get_page(virt_to_page(pmd)); + return 0; + } + + /* Create 2nd stage page mappings - Level 2 */ + BUG_ON(pmd_present(*pmd) && pmd_huge(*pmd)); if (pmd_none(*pmd)) { if (!cache) return 0; /* ignore calls from kvm_set_spte_hva */ @@ -514,16 +560,55 @@ out: return ret; } +static bool transparent_hugepage_adjust(struct kvm *kvm, pfn_t *pfnp, + phys_addr_t *ipap) +{ + pfn_t pfn = *pfnp; + gfn_t gfn = *ipap >> PAGE_SHIFT; + + if (PageTransCompound(pfn_to_page(pfn))) { + unsigned long mask; + /* + * mmu_notifier_retry was successful and we hold the + * mmu_lock here, so the pmd can't become splitting + * from under us, and in turn + * __split_huge_page_refcount() can't run from under + * us and we can safely transfer the refcount from + * PG_tail to PG_head as we switch the pfn from tail to + * head. + */ + mask = KVM_PAGES_PER_HPAGE - 1; + VM_BUG_ON((gfn & mask) != (pfn & mask)); + if (pfn & mask) { + gfn &= ~mask; + *ipap &= ~(KVM_HPAGE_SIZE - 1); + kvm_release_pfn_clean(pfn); + pfn &= ~mask; + kvm_get_pfn(pfn); + *pfnp = pfn; + } + + return true; + } + + return false; +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, - gfn_t gfn, struct kvm_memory_slot *memslot, + struct kvm_memory_slot *memslot, unsigned long fault_status) { - pte_t new_pte; - pfn_t pfn; int ret; - bool write_fault, writable; + bool write_fault, writable, hugetlb = false, force_pte = false; unsigned long mmu_seq; + gfn_t gfn = fault_ipa >> PAGE_SHIFT; + unsigned long hva = gfn_to_hva(vcpu->kvm, gfn); + struct kvm *kvm = vcpu->kvm; struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; + struct vm_area_struct *vma; + pfn_t pfn; + pte_t new_pte; + unsigned long psize; write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu)); if (fault_status == FSC_PERM && !write_fault) { @@ -531,6 +616,27 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } + /* Let's check if we will get back a huge page */ + down_read(&current->mm->mmap_sem); + vma = find_vma_intersection(current->mm, hva, hva + 1); + if (is_vm_hugetlb_page(vma)) { + hugetlb = true; + hva &= PMD_MASK; + gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; + psize = PMD_SIZE; + } else { + psize = PAGE_SIZE; + if (vma->vm_start & ~PMD_MASK) + force_pte = true; + } + up_read(&current->mm->mmap_sem); + + pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable); + if (is_error_pfn(pfn)) + return -EFAULT; + + coherent_icache_guest_page(kvm, hva, psize); + /* We need minimum second+third level pages */ ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS); if (ret) @@ -548,26 +654,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ smp_rmb(); - pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable); - if (is_error_pfn(pfn)) - return -EFAULT; - - new_pte = pfn_pte(pfn, PAGE_S2); - coherent_icache_guest_page(vcpu->kvm, gfn); - - spin_lock(&vcpu->kvm->mmu_lock); - if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) + spin_lock(&kvm->mmu_lock); + if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock; + if (!hugetlb && !force_pte) + hugetlb = transparent_hugepage_adjust(kvm, &pfn, &fault_ipa); + new_pte = pfn_pte(pfn, PAGE_S2); + if (hugetlb) + new_pte = pte_mkhuge(new_pte); if (writable) { kvm_set_s2pte_writable(&new_pte); kvm_set_pfn_dirty(pfn); } - stage2_set_pte(vcpu->kvm, memcache, fault_ipa, &new_pte, false); + ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false); out_unlock: - spin_unlock(&vcpu->kvm->mmu_lock); + spin_unlock(&kvm->mmu_lock); kvm_release_pfn_clean(pfn); - return 0; + return ret; } /** @@ -636,7 +740,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) memslot = gfn_to_memslot(vcpu->kvm, gfn); - ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status); + ret = user_mem_abort(vcpu, fault_ipa, memslot, fault_status); if (ret == 0) ret = 1; out_unlock: -- 1.8.1.2

12 years

[PATCH 0/3] sched: Sched Domains: Fixups

by Viresh Kumar

Peter/Ingo, These are minor fixes that I could find for code responsible for creating sched domains. They are rebased of my earlier fixes: https://lkml.org/lkml/2013/6/4/253 I couldn't find them in linux-next or tip/master and so giving this link. Viresh Kumar (3): sched: don't initialize alloc_state in build_sched_domains sched: don't sd->child to NULL when it is already NULL sched: Create for_each_sd_topology() kernel/sched/core.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) -- 1.7.12.rc2.18.g61b472e

12 years

[PATCH 00/11] CPUFreq Kconfig fixes

by Viresh Kumar

Hi Rafael, Recently Arnd sent few fixes where drivers were using APIs from freq_table.c but haven't selected CPU_FREQ_TABLE. Based on that, I just crossed checked all the places where it should be selected and where it shouldn't be. These are fixes around that. I have applied these in my cpufreq-kconfig-fixes branch. Will send you a pull request separately once I get some Acks (will wait for few days). Viresh Kumar (11): cpufreq: blackfin: enable driver for CONFIG_BFIN_CPU_FREQ cpufreq: cris: select CPU_FREQ_TABLE cpufreq: davinci: select CPU_FREQ_TABLE cpufreq: exynos: select CPU_FREQ_TABLE cpufreq: highbank: remove select CPU_FREQ_TABLE cpufreq: imx: select CPU_FREQ_TABLE cpufreq: powerpc: CBE_RAS: select CPU_FREQ_TABLE cpufreq: pxa: select CPU_FREQ_TABLE cpufreq: S3C2416/S3C64XX: select CPU_FREQ_TABLE cpufreq: tegra: select CPU_FREQ_TABLE for ARCH_TEGRA cpufreq: X86_AMD_FREQ_SENSITIVITY: select CPU_FREQ_TABLE arch/arm/mach-davinci/Kconfig | 1 + arch/arm/mach-pxa/Kconfig | 3 +++ arch/arm/mach-tegra/Kconfig | 4 +--- arch/cris/Kconfig | 2 ++ drivers/cpufreq/Kconfig.arm | 6 +++++- drivers/cpufreq/Kconfig.powerpc | 1 + drivers/cpufreq/Kconfig.x86 | 1 + drivers/cpufreq/Makefile | 2 +- 8 files changed, 15 insertions(+), 5 deletions(-) -- 1.7.12.rc2.18.g61b472e

12 years

[ACTIVITY] (Linus Walleij) 2013-06-08 - 2013-06-15

by Linus Walleij

== Linus Walleij linusw == === Highlights === * Merged the runtime PM pinctrl states device core container patch into the pinctrl tree. Now discussing the OMAP "active" state with affected maintainers. * Olof J pulled all 5 ux500 branches for v3.11 * Iterated the Integrator/AP pull request after it was discovered that it broke on ATAG build. Mea culpa. Hopefully the fixed version get pulled. * Sent a pull request for the Integrator PCI DT patch series to ARM SoC. * Sent fixes on top of the U300 Device Tree and multiplatform branch to address the last review comments by utilizing regmap/syscon and attempt to move board power into the regulator driver. If we can sort this out I can line up a pull request. * Merged pinctrl patches for sparser GPIO ranges i.e. where pinctrl GPIO ranges are not entirely linear. Christian Ruppert needed this and it enables us to proceed with the Intel Bay Trail as a pinctrl driver. * Reviewed lots of pinctrl code. Qeueued some pinctrl patches. * Adviced on how pinctrl works to LKML newbies. * Involved in some Allwinner reviews. === Plans === * I have a ux500-defconfig branch, that will be submitted later, turning on this and some more new stuff that will hit the v3.11 merge window. Maybe this need to come after v3.11-rc1. * Finalize U300 DT+multiplatform patch set and send a pull request for it. * Start to delete Integrator board files and convert to multiplatform once the PCI DT patches land in ARM SoC. * Convert Nomadik pinctrl driver to register GPIO ranges from the gpiochip side. * Test the PL08x patches on the Ericsson Research PB11MPCore and submit platform data for using pl08x DMA on that platform. === Issues === * Subsystem maintainers in the kernel community are acting like Judge Dredd on DT review and commit issues, as noted last week. * Some impediments from internal turmoil @ST-Ericsson. Thanks, Linus Walleij

12 years

[ACTIVITY] (John Stultz) June 10-14

by John Stultz

=== Highlights === * Cleaned/fixed up and sent out volatile ranges (v8) patchset to lkml * Sent re-factored ION patchset to Rebecca, Arnd, Jesse and Serban * Updated linaro.android kernel to the AOSP 3.10-rc5 base branch * Discussed Android's adoption of memcg pressure notifications w/ AntonV and Android devs. * Thomas merged my current 3.11 queue into -tip * Worked with Zoran on his mmc wakeup_source patch * Tried to sort out vfat ioctl issues w/ Android devs, so we can get something upstream. * Discussed & reviewed a number of community time/rtc patches on lkml * Implemented a new alarmtimer test for my timekeeping testsuite * Worked out some details on LCE Android Graphics Upstreaming session * More work on Plumbers Android MiniConf (& got another yes from an Android dev!) * Reviewed blueprints and sent out weekly status mail * Attended LSK android patch discussion * Attended Linaro internal patch review discussion === Plans === * Try to get Anton's ulmkd updated to use upstreamed memcg mempressure notifier * Re-integrate noswap purging into vrange patchset * Update refactored ion patches to include changes from the AOSP 3.10-rc5 branch * Sort out the rest of my 3.11 queue and send to Thomas * Still have to do some blueprint breaking up for Jakub === Issues === * N/A

12 years

← Newer
1
2
3
4
5
6
7
8
9
Older →

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

linaro-kernel June 2013