We've had a discussion in the Linaro storage team (Saugata, Venkat and me,
with Luca joining in on the discussion) about swapping to flash based media
such as eMMC. This is a summary of what we found and what we think should
be done. If people agree that this is a good idea, we can start working
on it.
The basic problem is that Linux without swap is sort of crippled and some
things either don't work at all (hibernate) or not as efficient as they
should (e.g. tmpfs). At the same time, the swap code seems to be rather
inappropriate for the algorithms used in most flash media today, causing
system performance to suffer drastically, and wearing out the flash hardware
much faster than necessary. In order to change that, we would be
implementing the following changes:
1) Try to swap out multiple pages at once, in a single write request. My
reading of the current code is that we always send pages one by one to
the swap device, while most flash devices have an optimum write size of
32 or 64 kb and some require an alignment of more than a page. Ideally
we would try to write an aligned 64 kb block all the time. Writing aligned
64 kb chunks often gives us ten times the throughput of linear 4kb writes,
and going beyond 64 kb usually does not give any better performance.
2) Make variable sized swap clusters. Right now, the swap space is
organized in clusters of 256 pages (1MB), which is less than the typical
erase block size of 4 or 8 MB. We should try to make the swap cluster
aligned to erase blocks and have the size match to avoid garbage collection
in the drive. The cluster size would typically be set by mkswap as a new
option and interpreted at swapon time.
3) As Luca points out, some eMMC media would benefit significantly from
having discard requests issued for every page that gets freed from
the swap cache, rather than at the time just before we reuse a swap
cluster. This would probably have to become a configurable option
as well, to avoid the overhead of sending the discard requests on
media that don't benefit from this.
Does this all sound appropriate for the Linux memory management people?
Also, does this sound useful to the Android developers? Would you
start using swap if we make it perform well and not destroy the drives?
Finally, does this plan match up with the capabilities of the
various eMMC devices? I know more about SD and USB devices and
I'm quite convinced that it would help there, but eMMC can be
more like an SSD in some ways, and the current code should be fine
for real SSDs.
Arnd
Hi all,
In v3:
- Make traces versioned, as suggested by Steven, Tony and Colin. (The
version tag is stored in the PRZ signature, see the last patch for
the implementation details).
- Add Steven's Ack on the first patch.
In v2:
- Do not introduce a separate 'persistent' tracer, but introduce an
option to the existing 'function' tracer.
Rationale for this patch set:
With this support kernel can save functions call chain log into a
persistent ram buffer that can be decoded and dumped after reboot
through pstore filesystem. It can be used to determine what function
was last called before a hang or an unexpected reset (caused by, for
example, a buggy driver that abuses HW).
Here's a "nano howto", to get the idea:
# mount -t debugfs debugfs /sys/kernel/debug/
# cd /sys/kernel/debug/tracing
# echo function > current_tracer
# echo 1 > options/func_pstore
# reboot -f
[...]
# mount -t pstore pstore /mnt/
# tail /mnt/ftrace-ramoops
0 ffffffff8101ea64 ffffffff8101bcda native_apic_mem_read <- disconnect_bsp_APIC+0x6a/0xc0
0 ffffffff8101ea44 ffffffff8101bcf6 native_apic_mem_write <- disconnect_bsp_APIC+0x86/0xc0
0 ffffffff81020084 ffffffff8101a4b5 hpet_disable <- native_machine_shutdown+0x75/0x90
0 ffffffff81005f94 ffffffff8101a4bb iommu_shutdown_noop <- native_machine_shutdown+0x7b/0x90
0 ffffffff8101a6a1 ffffffff8101a437 native_machine_emergency_restart <- native_machine_restart+0x37/0x40
0 ffffffff811f9876 ffffffff8101a73a acpi_reboot <- native_machine_emergency_restart+0xaa/0x1e0
0 ffffffff8101a514 ffffffff8101a772 mach_reboot_fixups <- native_machine_emergency_restart+0xe2/0x1e0
0 ffffffff811d9c54 ffffffff8101a7a0 __const_udelay <- native_machine_emergency_restart+0x110/0x1e0
0 ffffffff811d9c34 ffffffff811d9c80 __delay <- __const_udelay+0x30/0x40
0 ffffffff811d9d14 ffffffff811d9c3f delay_tsc <- __delay+0xf/0x20
Mostly the code comes from trace_persistent.c driver found in the
Android git tree, written by Colin Cross <ccross(a)android.com>
(according to sign-off history). I reworked the driver a little bit,
and ported it to pstore subsystem.
--
Documentation/ramoops.txt | 25 +++++++++
fs/pstore/Kconfig | 13 +++++
fs/pstore/Makefile | 1 +
fs/pstore/ftrace.c | 35 +++++++++++++
fs/pstore/inode.c | 111 ++++++++++++++++++++++++++++++++++++++--
fs/pstore/internal.h | 43 ++++++++++++++++
fs/pstore/platform.c | 12 ++++-
fs/pstore/ram.c | 65 +++++++++++++++++------
fs/pstore/ram_core.c | 12 +++--
include/linux/pstore.h | 13 +++++
include/linux/pstore_ram.h | 3 +-
kernel/trace/trace.c | 7 +--
kernel/trace/trace_functions.c | 25 +++++++--
13 files changed, 330 insertions(+), 35 deletions(-)
--
Anton Vorontsov
Email: cbouatmailru(a)gmail.com
=== Highlights ===
* Watched a number of Google IO talks. Most interested in the the
systrace tool. From initial research, it seems Jelly Bean is actually
going to be based on an updated 3.0 kernel (same as ICS), meaning the
3.4 based kernels (including the merged wakelocks and other upstreamed
features) won't likely be used until the Oct phone release. This may
cause some pain for Linaro as changes in the 3.4 based kernel require
some new userland improvements, which likely won't be available till
after Oct.
* Sent out updated volatile range v5 iteration. Still not much feedback.
After some private discussion w/ Dave Hansen I'm looking into
implementing something closer to Minchan's suggestion of a separate LRU
for volatile pages. Got it sort of half working right now, and hope to
sort things out soon.
* LinusW reviewed the ETM patches. Had a number of suggestions for
changes, including increased documentation. I'll be looking at trying to
address those concerns, and resubmitting soon.
* Finished newsletter update
* Sent out weekly android-usptreaming email.
=== Plans ===
* Debug current LRU_VOLATILE work and hopefully get a early rough draft
sent to lkml
* Spend some time learning more about ETM functionality and try to make
use of it so I can have some context with which to write documentation
* Ping Google devs (likely at Google IO this week) about the mmc
wakelock changes
=== Issues ===
GAH! HOW DID IT GET TO BE JULY ALREADY!?
Hi,
I am trying to boot the tilt-tracking branch of the TI OMAP kernel git
repository. The silicon is OMAP4430 ES2.1 on TI Blaze platform. I see a
kernel crash while booting. The logs also show that the HDMI GPIOs
requests are not successful.
What I am looking for is a fairly recent kernel on which I can base the
rest of the userspace (media) application on. Any help on which branch
to look for, will be greatly appreciated.
Thanks
Ramakrishnan
[ 4.836944] Console: switching to colour frame buffer device 240x67
[ 4.892089] sdp4430_panel_enable_hdmi: Cannot request HDMI GPIOs
[ 4.898437] omapdss HDMI error: failed to enable GPIO's
[ 4.903991] omapdss error: failed to power on
[ 4.908599] omapfb omapfb: Failed to enable display 'hdmi'
[ 4.914398] omapfb omapfb: failed to initialize default display
[ 4.921325] omapfb omapfb: failed to setup omapfb
[ 4.926330] omapfb: probe of omapfb failed with error -16
[ 4.932586] VANA: incomplete constraints, leaving on
[ 4.938934] VDAC: incomplete constraints, leaving on
[ 4.945983] twl_rtc twl_rtc: setting system clock to 2000-01-01
00:00:03 UTC (946684803)
[ 4.954620] ALSA device list:
[ 4.957733] #0: SDP4430
[ 4.960540] #1: OMAPHDMI
[ 4.997589] Division by zero in kernel.
[ 4.997619] Backtrace:
[ 4.997650] [<c0012d4c>] (dump_backtrace+0x0/0x118) from [<c04ec94c>]
(dump_stack+0x20/0x24)
[ 4.997650] r6:00000000 r5:f0800000 r4:ee21dd80 r3:00000000
[ 4.997680] [<c04ec92c>] (dump_stack+0x0/0x24) from [<c0012ea8>]
(__div0+0x20/0x28)
[ 4.997711] [<c0012e88>] (__div0+0x0/0x28) from [<c0259b14>]
(Ldiv0+0x8/0x10)
[ 4.997772] [<c0287348>] (cfb_imageblit+0x0/0x46c) from [<c0280828>]
(soft_cursor+0x1b8/0x1c4)
[ 4.997772] [<c0280670>] (soft_cursor+0x0/0x1c4) from [<c02801b0>]
(bit_cursor+0x43c/0x44c)
[ 4.997802] [<c027fd74>] (bit_cursor+0x0/0x44c) from [<c027a920>]
(fb_flashcursor+0x108/0x124)
[ 4.997802] [<c027a818>] (fb_flashcursor+0x0/0x124) from [<c00562e4>]
(process_one_work+0x274/0x420)
[ 4.997833] [<c0056070>] (process_one_work+0x0/0x420) from
[<c005681c>] (worker_thread+0x1bc/0x2bc)
[ 4.997863] [<c0056660>] (worker_thread+0x0/0x2bc) from [<c005c93c>]
(kthread+0x98/0xa4)
[ 4.997863] [<c005c8a4>] (kthread+0x0/0xa4) from [<c0041300>]
(do_exit+0x0/0x790)
[ 4.997894] r7:00000013 r6:c0041300 r5:c005c8a4 r4:ee093ed8
[ 5.192901] Division by zero in kernel.
[ 5.192901] Backtrace:
Hi.
From tilt-3.3, the v4l2 display drivers were dependent on:
ARCH_OMAP2 || ARCH_OMAP3 || ARCH_OMAP4 || ARCH_OMAP5
in 3.4, this has changed to ARCH_OMAP2 || ARCH_OMAP3
Is there a reason omap4 and omap5 have been removed? Has something
broken, or is it just something missed when merging from upstream?
Can't find any commit about that change, only commit there is from
ARCH_OMAP24XX || ARCH_OMAP34XX to what it is now, which I guess was
during the merge or something.
Best regards
Martin
== Highlights ==
* Got KDB running on the Versatile-PB board (in QEMU), plus got FIQs
somewhat working. Feature-wise, the only thing left to do is to
get KDB to work in the FIQ context, and then we can just clean up
the code here and there, and submit it;
* Early probing for pstore merged into -next;
* Half of pstore ECC improvements merged into -next, another half
is still pending;
* I implemented versioned tracing as a separate persistent ram zone
for the control data, but I didn't like it, it was ugly. I think
I can do it better: something like persistent_ram_reduce(), a new
call that would reduce the current buffer, leaving space for the
control data;
* vmevents discussion continues on the lkml, It seems that Minchan's
idea is not as easy as it seemed to be, there are many corner cases
and limitations. And I still don't actually follow how exactly
having a separate LRU would help obtaining statistics.
== Plans ==
* Continue work on FIQ debugger.
* Resend pstore patches that didn't make it into -next. There are
just a few left.
--
Anton Vorontsov
Email: cbouatmailru(a)gmail.com
=== Highlights ===
* Continued to review Shawn's big patch series adding irqdomain and sparse
irq support for IMX platforms.
Had some different options with Shawn on the implementation, i pointed
out some drawbacks of his implementation
and sent out a patch to describe my idea, but Shawn did not agree with my
idea although i'm still not very convinced
by his reasons. Currently we still use the original way.
* sent two irq core minor fix and clean up patches
* sent patch of: reform prom_update_property function which can simply the
using and remove a lot duplicated code,
acked by maintainer already.
* reviewed Shawn's ARM: mxs: store mac address read from OTP in device tree
patch.
gave some comments and suggested him to rebase his patch on my of: reform
prom_update_property patch.
* Reviewed ASoC: dmaengine-pcm: Add support for querying stream position
from DMA driver.
The current imx/mxs dma driver still does not support tx_status querying
for cyclic dma transfer, will add the support later
when have time.
* reviewed and sent a pinctrl-imx fix patch.
Asked Linus to merge it for 3.5 kernel.
=== Highlights ===
* Reworked the mmc wakelocks patch to use wakeup sources and sent to
google developers for review prior to sending to lkml
* After getting negative feedback on my interval tree implementation,
I'm reworking my volatile ranges patch set to not introduce a new base
type and just use rbtrees internally.
* Submitted a number of changes in timekeeping for 3.6 to Thomas for -tip
* Wrote first draft on android usptreaming update for member newsletter
* Resent out the ETM patches for inclusion
* Ran bi-weekly android-usptreaming subteam call (over irc since mumble
was having problems)
* Merged fix from Tushar into linaro-android-3.5-jstultz-rebase tree
=== Plans ===
* Continue reworking volatile range code & resubmit for comment
* Submit mmc wakeup source patch to lkml
* Ping Russel on ETM
* Finish member newsletter update
=== Issues ===
NA
====Activity Summary====
- Completed DT binding for ab8500-btemp
. Pending: Code Review/Rework and Documentation
. Added a macro to traverse phandle list in a property
. Unit/Driver level test succeeds in fetching and populating DT information
- Started DT bindings for ab8500 Fuel Guage driver as ab8500-btemp driver
expects "battery resistance" platform information from ab8500-fg
without which ab8500 probe shall not be complete as per the present sequence
- Support for Raj on mmap freezing issue:
- Observed that iterative vmalloc and vfree is causing kernel to free on
fully loaded/android case, however the said issue is not observed in
LBP (non-android/ubuntu) build. Raj has created ticket on igloo kernel
community
- Looking into L2Cache settings highlighted by mathieu,
trying to seek information on offset 0x7CC used
and necessity of using cache lockdown (0x900/4)
Ref: cleanup_before_linux()
=====Plan====
- Complete DT binding for ab8500-fg driver
- test ab8500-btemp
- Documentation
- Review and rework
====issues====
NA
Thanks,
Rajanikanth