Currently MIN_LATENCY_MULTIPLIER is set defined as 100 and so on a system with
transition latency of 1 ms, the minimum sampling time comes to be around 100 ms.
That is quite big if you want to get better performance for your system.
Redefine MIN_LATENCY_MULTIPLIER to 20 so that we can support 20ms sampling rate
for such platforms.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
Hi Guys,
I really don't know how this figure (100) came initially, but we really need to
have 20ms support for my platform: ARM TC2.
Pushed here:
http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/h…
drivers/cpufreq/cpufreq_governor.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index d2ac911..adb8e30 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -34,7 +34,7 @@
*/
#define MIN_SAMPLING_RATE_RATIO (2)
#define LATENCY_MULTIPLIER (1000)
-#define MIN_LATENCY_MULTIPLIER (100)
+#define MIN_LATENCY_MULTIPLIER (20)
#define TRANSITION_LATENCY_LIMIT (10 * 1000 * 1000)
/* Ondemand Sampling types */
--
1.7.12.rc2.18.g61b472e
=== Highlights ===
* Flew to SF and presented at ABS, then flew back home on Monday.
Slides are here:
http://events.linuxfoundation.org/images/stories/slides/abs2013_stultz.pdf
* Spurred by discussion at ABS, worked out how to get ADB running on
vanilla linux:
https://plus.google.com/u/0/111524780435806926688/posts/AaEccFjKNHE
* My discussion proposal for lsf/mm-minisummit on volatile ranges was
accepted and I was formally invited to attend
* Discussed Serbans' ashmem compat_ioctl patches with Arve and Serban.
* Sent out late android upstreaming subteam mail
* Synced up with Jakub on Android Upstreaming session at connect
* Got my 3.9 timekeeping queue merged upstream, and reviewed and queued
a number of timekeeping patches for 3.10
* Discussed some timekeeping changes with tglx, and reviewed some of his
patches.
* Implemented a first pass at using valid-cycle-ranges with vdso based
gettime calls to avoid potential race windows with virtualized kernels.
This will allow for reduced lock hold times in the future.
=== Plans ===
* Review Serban's binder patches
* Look into Androids support of large-files with 32bit applications
* Send out sync driver for staging (as I've not heard back from Erik or
other folks at Google)
* Prep for Connect
=== Issues ===
* NA
=== David Long ===
=== Travel/Time Off ===
* Monday February 18th (U.S. Washington's Birthday, aka President's Day)
=== Highlights ===
* I'm dealing with problems getting the uprobe uprobe patch to
correctly process the breakpoint. I see the breakpoint being placed
but the result when it hits it seems to be corrupted context.
* I received email back from Rabin Vincent saying he had no plans to
work on this any more and he is happy if I want to take it over. He
has volunteered to supply his tests, which I hope to see shortly.
=== Plans ===
* Debug the problems I'm experiencing with the patch, then move on to
addressing the upstream concerns about its integration.
=== Issues ===
-dl
On 64-bit platforms, reads/writes of the various cpustat fields are
atomic due to native 64-bit loads/stores. However, on non 64-bit
platforms, reads/writes of the cpustat fields are not atomic and could
lead to inconsistent statistics.
This problem was originally reported by Frederic Weisbecker as a
64-bit limitation with the nsec granularity cputime accounting for
full dynticks, but then we realized that it's a problem that's been
around for awhile and not specific to the new cputime accounting.
This series fixes this by first converting all access to the cputime
fields to use accessor functions, and then converting the accessor
functions to use the atomic64 functions.
Implemented based on idea proposed by Frederic Weisbecker.
Kevin Hilman (2):
cpustat: use accessor functions for get/set/add
cpustat: convert to atomic operations
arch/s390/appldata/appldata_os.c | 16 +++++++--------
drivers/cpufreq/cpufreq_governor.c | 18 ++++++++---------
drivers/cpufreq/cpufreq_ondemand.c | 2 +-
drivers/macintosh/rack-meter.c | 6 +++---
fs/proc/stat.c | 40 +++++++++++++++++++-------------------
fs/proc/uptime.c | 2 +-
include/linux/kernel_stat.h | 11 ++++++++++-
kernel/sched/core.c | 12 +++++-------
kernel/sched/cputime.c | 29 +++++++++++++--------------
9 files changed, 70 insertions(+), 66 deletions(-)
--
1.8.1.2
== Linus Walleij linusw ==
=== Highlights ===
* Finalized a GPIO+pinctrl presentation for the Embedded Linux
Conference, and presented on the first day of the conference.
Slides will be posted.
* Finalized the pinctrl tree before traveling, sent a pull request to
Torvalds as soon as the merge window opened and he pulled it
in.
* AB8500 GPIO patches and all other cleanup has been merged
up to the pinctrl and ARM SoC trees and pulled in by Torvalds.
MFD is pending but Sam has sent a pull request for this part as
well.
* Other queued fixes for mach-ux500 and also the PCI regression
fix has propagated upstream.
* Reviewed misc GPIO, pinctrl and other patches, updated
blueprints...
=== Plans ===
* Fix regressions popping up in the merge window.
There are always such...
* Attack the remaining headers in arch/arm/mach-ux500
so we can move forward with multiplatform for v3.9.
* Convert the Nomadik to multiplatform.
* Convert Nomadik pinctrl driver to register GPIO ranges
from the gpiochip side.
* Test the PL08x patches on the Ericsson Research
PB11MPCore and submit platform data for using
pl08x DMA on that platform.
* Look into other Ux500 stuff in need of mainlining...
using an internal tracking sheet for this.
* Get hands dirty with regmap.
=== Issues ===
* N/A
Thanks,
Linus Walleij
Since ARMv6 new atomic instructions have been introduced:
ldrex/strex. Several implementation are possible based on (1) global
and local exclusive monitors and (2) local exclusive monitor and snoop
unit.
In case of the 2nd options exclusive store operation on uncached
region may be faulty.
Check for availability of global monitor to provide some hint about
possible issues.
Signed-off-by: Vladimir Murzin <murzin.v(a)gmail.com>
---
Changes since
v1:
- Using L_PTE_MT_BUFFERABLE instead of L_PTE_MT_UNCACHABLE
Thanks to Russell for ponting this silly error
- added comment about how checking is done
arch/arm/include/asm/bugs.h | 14 +++++++++--
arch/arm/mm/fault-armv.c | 55 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 67 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/bugs.h b/arch/arm/include/asm/bugs.h
index a97f1ea..29d73cd 100644
--- a/arch/arm/include/asm/bugs.h
+++ b/arch/arm/include/asm/bugs.h
@@ -13,9 +13,19 @@
#ifdef CONFIG_MMU
extern void check_writebuffer_bugs(void);
-#define check_bugs() check_writebuffer_bugs()
+#if __LINUX_ARM_ARCH__ < 6
+static void check_gmonitor_bugs(void) {};
#else
-#define check_bugs() do { } while (0)
+extern void check_gmonitor_bugs(void);
+#endif
+
+static inline void check_bugs(void)
+{
+ check_writebuffer_bugs();
+ check_gmonitor_bugs();
+}
+#else
+static inline void check_bugs(void) { }
#endif
#endif
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 2a5907b..6a1a07e 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -205,6 +205,61 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
__flush_icache_all();
}
}
+#else
+/*
+ * Check for the global exclusive monitor. The global monitor is a external
+ * transaction monitoring block for tracking exclusive accesses to sharable
+ * memory regions. LDREX/STREX rely on this monitor when accessing uncached
+ * shared memory.
+ * If global monitor is not implemented STREX operation on uncached shared
+ * memory region always fail, returning 0 in the destination register.
+ * We rely on this property to check whether global monitor is implemented
+ * or not.
+ * NB: The name of L_PTE_MT_BUFFERABLE is not for B bit, but for normal
+ * non-cacheable memory type (XXCB = 0001).
+ */
+void __init check_gmonitor_bugs(void)
+{
+ struct page *page;
+ const char *reason;
+ unsigned long res = 1;
+
+ printk(KERN_INFO "CPU: Testing for global monitor: ");
+
+ page = alloc_page(GFP_KERNEL);
+ if (page) {
+ unsigned long *p;
+ pgprot_t prot = __pgprot_modify(PAGE_KERNEL,
+ L_PTE_MT_MASK, L_PTE_MT_BUFFERABLE);
+
+ p = vmap(&page, 1, VM_IOREMAP, prot);
+
+ if (p) {
+ int temp, res;
+
+ __asm__ __volatile__(
+ "ldrex %1, [%2]\n"
+ "strex %0, %1, [%2]"
+ : "=&r" (res), "=&r" (temp)
+ : "r" (p)
+ : "cc", "memory");
+
+ reason = "n/a (atomic ops may be faulty)";
+ } else {
+ reason = "unable to map memory\n";
+ }
+
+ vunmap(p);
+ put_page(page);
+ } else {
+ reason = "unable to grab page\n";
+ }
+
+ if (res)
+ printk("failed, %s\n", reason);
+ else
+ printk("ok\n");
+}
#endif /* __LINUX_ARM_ARCH__ < 6 */
/*
--
1.7.10.4
Thanks for review Russel!
On Mon, Feb 18, 2013 at 04:44:20PM +0000, Russell King - ARM Linux wrote:
> On Mon, Feb 18, 2013 at 08:26:50PM +0400, Vladimir Murzin wrote:
> > Since ARMv6 new atomic instructions have been introduced:
> > ldrex/strex. Several implementation are possible based on (1) global
> > and local exclusive monitors and (2) local exclusive monitor and snoop
> > unit.
> >
> > In case of the 2nd option exclusive store operation on uncached
> > region may be faulty.
> >
> > Check for availability of the global monitor to provide some hint about
> > possible issues.
>
> How does this code actually do that?
According to DHT0008A_arm_synchronization_primitives.pdf the global
monitor is introduce to track exclusive accesses to sharable memory
regions. This article also says about some system-wide implication
which should be taken into account:
(1) for systems with coherency management
(2) for systems without coherency management
The first one lay on SCU, L1 data cache and local monitor. The second
one requires implementation of global monitor if memory regions cannot
be cached.
It set up the behaviour for store-exclusive operations when global
monitor is not available: these operations always fail.
Taking all these into account we can guess about availability of global
monitor by using store-exclusive operation on uncached memory region.
>
> > +void __init check_gmonitor_bugs(void)
> > +{
> > + struct page *page;
> > + const char *reason;
> > + unsigned long res = 1;
> > +
> > + printk(KERN_INFO "CPU: Testing for global monitor: ");
> > +
> > + page = alloc_page(GFP_KERNEL);
> > + if (page) {
> > + unsigned long *p;
> > + pgprot_t prot = __pgprot_modify(PAGE_KERNEL,
> > + L_PTE_MT_MASK, L_PTE_MT_UNCACHED);
> > +
> > + p = vmap(&page, 1, VM_IOREMAP, prot);
>
> This is bad practise. Remapping a page of already mapped kernel memory
> using different attributes (in this case, strongly ordered) is _definitely_
> a violation of the architecture requirements. The behaviour you will see
> from this are in no way guaranteed.
DDI0406C_arm_architecture_reference_manual.pdf (A3-131) says:
A memory location can be marked as having different cacheability
attributes, for example when using aliases in a
virtual to physical address mapping:
* if the attributes differ only in the cache allocation hint this does
not affect the behavior of accesses to that location
* for other cases see Mismatched memory attributes on page A3-136.
Isn't L_PTE_MT_UNCACHED about cache allocation hint?
>
> If you want to do this, it must either come from highmem, or not already
> be mapped.
>
> Moreover, this is absolutely silly - the ARM ARM says:
>
> "LDREX and STREX operations *must* be performed only on memory with the
> Normal memory attribute."
DDI0406C_arm_architecture_reference_manual.pdf (A3-121) says:
It is IMPLEMENTATION DEFINED whether LDREX and STREX operations can be
performed to a memory region with the Device or Strongly-ordered
memory attribute. Unless the implementation documentation explicitly
states that LDREX and STREX operations to a memory region with the
Device or Strongly-ordered attribute are permitted, the effect of such
operations is UNPREDICTABLE.
At least it allows to perform operations on memory region with the
Strongly-ordered attribute... but still unpredictable.
>
> L_PTE_MT_UNCACHED doesn't get you that. As I say above, that gets you
> strongly ordered memory, not "normal memory" as required by the
> architecture for use with exclusive types.
>
> > +
> > + if (p) {
> > + int temp;
> > +
> > + __asm__ __volatile__( \
> > + "ldrex %1, [%2]\n" \
> > + "strex %0, %1, [%2]" \
> > + : "=&r" (res), "=&r" (temp) \
> > + : "r" (p) \
>
> \ character not required for any of the above. Neither is the __ version
> of "asm" and "volatile".
Thanks.
>
> > + : "cc", "memory");
> > +
> > + reason = "n\\a (atomic ops may be faulty)";
>
> "n\\a" ?
"not detected"?
> So... at the moment this has me wondering - you're testing atomic
> operations with a strongly ordered memory region, which ARM already
> define this to be outside of the architecture spec. The behaviour you
> see is not defined architecturally.
>
> And if you're trying to use LDREX/STREX to a strongly ordered or device
> memory region, then you're quite right that it'll be unreliable. It's
> not defined to even work. That's not because they're faulty, it's because
> you're abusing them.
However, IRL it is not hard to meet this undefined difference. At
least I'm able to see it on Tegra2 Harmony and Pandaboard. Moreover,
demand on Normal memory attribute breaks up ability to turn caches
off. In this case we are not able to boot the system up (seen on
Tegra2 harmony). This patch is aimed to highlight the difference in
implementation. That's why it has some softering in guessing about
faulty. Might be it worth warning about unpredictable effect instead?
Best wishes
Vladimir