We've had a discussion in the Linaro storage team (Saugata, Venkat and me,
with Luca joining in on the discussion) about swapping to flash based media
such as eMMC. This is a summary of what we found and what we think should
be done. If people agree that this is a good idea, we can start working
on it.
The basic problem is that Linux without swap is sort of crippled and some
things either don't work at all (hibernate) or not as efficient as they
should (e.g. tmpfs). At the same time, the swap code seems to be rather
inappropriate for the algorithms used in most flash media today, causing
system performance to suffer drastically, and wearing out the flash hardware
much faster than necessary. In order to change that, we would be
implementing the following changes:
1) Try to swap out multiple pages at once, in a single write request. My
reading of the current code is that we always send pages one by one to
the swap device, while most flash devices have an optimum write size of
32 or 64 kb and some require an alignment of more than a page. Ideally
we would try to write an aligned 64 kb block all the time. Writing aligned
64 kb chunks often gives us ten times the throughput of linear 4kb writes,
and going beyond 64 kb usually does not give any better performance.
2) Make variable sized swap clusters. Right now, the swap space is
organized in clusters of 256 pages (1MB), which is less than the typical
erase block size of 4 or 8 MB. We should try to make the swap cluster
aligned to erase blocks and have the size match to avoid garbage collection
in the drive. The cluster size would typically be set by mkswap as a new
option and interpreted at swapon time.
3) As Luca points out, some eMMC media would benefit significantly from
having discard requests issued for every page that gets freed from
the swap cache, rather than at the time just before we reuse a swap
cluster. This would probably have to become a configurable option
as well, to avoid the overhead of sending the discard requests on
media that don't benefit from this.
Does this all sound appropriate for the Linux memory management people?
Also, does this sound useful to the Android developers? Would you
start using swap if we make it perform well and not destroy the drives?
Finally, does this plan match up with the capabilities of the
various eMMC devices? I know more about SD and USB devices and
I'm quite convinced that it would help there, but eMMC can be
more like an SSD in some ways, and the current code should be fine
for real SSDs.
Arnd
Hi all,
And another respin, v5 this time:
- Split out fixes into a separate series;
- Added proper spinlock protection to the pstore/console interface
(the bug I found when was adding ftrace interface);
- And as I'm about to add ftrace support to pstore, to not touch
the same lines of code twice, I reworked 'Factor ramoops_get_dump_prz()
out of ramoops_pstore_read()' patch into 'Factor ramoops_get_next_prz()
out of ramoops_pstore_read()'. This is just a more generic interface
that will work for both console and ftrace przs.
Since the patch changed drastically, it lost Kees' ack, so it needs a
re-ack.
- The same as above happened w/ 'Introduce ramoops_context.max_dump_count'
patch, it turned into 'Give proper names to dump-related variables', it
also needs a re-ack.
- If anyone is willing to try the patches, for convenience they are now
available in the git repository:
git://git.infradead.org/users/cbou/linux-pstore.git
or gitweb:
http://git.infradead.org/users/cbou/linux-pstore.git
In v4:
- Per Kees Cook's comments, the patches no longer remove an automatic
updates feature, but instead make the it configurable; plus disable
it by default (in a separate patch);
- Fixed some bugs noticed by Colin Cross;
- Documented new continuous ramoops-console log behaviour (also
noticed by Colin Cross).
In v3:
- Rebased on top of current staging-next;
- The series are getting bigger. This is partly because we now support
different persistent zone sizes for oops records and console log,
per Colin Cross' request.
And I believe the code is now more manageable for further enhancements
(e.g. if we'd want to add other message types, e.g. tracing);
- Addressed Kees Cook's comments on the unlinking matters;
- Removed automatic updates support. Please see the last patch
description for rationale;
- A new fixup for pstore/inode, just getting rid of a sparse warning.
In v2:
- Updated documentation per Colin Cross' comments;
- Corrected return value in ramoops_pstore_write() (noticed by Kees Cook);
- Fixed large writes handling in pstore_console_write(), i.e. when
log_buf write is larger than pstore bufsize. Also Noticed by Kees Cook.
And a boilerplate for the series:
Currently pstore doesn't support logging kernel messages in run-time,
it only dumps dmesg when kernel oopses/panics. This makes pstore
useless for debugging hangs caused by HW issues or improper use of HW
(e.g. weird device inserted -> driver tried to write reserved bits ->
SoC hanged. In that case we don't get any messages in the pstore.
This series add a new message type for pstore, i.e. PSTORE_TYPE_CONSOLE,
plus make pstore/ram.c handle the new messages.
The old ram_console driver is removed. This might probably cause
some pain for out-of-tree code, as it would need to be adjusted...
but "no pain, no gain"? :-) Though, if there's some serious resistance,
we can probably postpone the last two patches.
Thanks!
---
Documentation/ramoops.txt | 14 ++
drivers/staging/android/Kconfig | 5 -
drivers/staging/android/Makefile | 1 -
drivers/staging/android/ram_console.c | 179 ------------------------
fs/pstore/Kconfig | 7 +
fs/pstore/inode.c | 3 +
fs/pstore/platform.c | 54 +++++++-
fs/pstore/ram.c | 246 ++++++++++++++++++++++++---------
fs/pstore/ram_core.c | 81 +----------
include/linux/pstore.h | 1 +
include/linux/pstore_ram.h | 20 +--
11 files changed, 261 insertions(+), 350 deletions(-)
--
Anton Vorontsov
Email: cbouatmailru(a)gmail.com
Hi Greg,
In the light of Linus' response, and I said this to Colin already, I'll
just zap a prz at boot time for pstore/console interface, which means
that nowadays there shouldn't be any objections to this bunch of fixes.
These are valid fixes for v3.5, they restore old pstore's behavior
nuances, which I changed accidentaly.
Except for the last patch, which is just a fix I happened to make when
I got bored of the warning. :-) Not a regression fix, though.
Thanks,
---
fs/pstore/inode.c | 2 +-
fs/pstore/ram.c | 3 +++
fs/pstore/ram_core.c | 27 +++++++++++++++++----------
include/linux/pstore_ram.h | 2 ++
4 files changed, 23 insertions(+), 11 deletions(-)
--
Anton Vorontsov
Email: cbouatmailru(a)gmail.com
Hi all,
Accounting only free pages is very inaccurate for low memory handling,
so we have to be smarter here.
The patch set implements a new attribute, which is blended from various
memory statistics. Vmevent can't expose all the kernel internals to the
userland, as it would make internal Linux MM representation tied to the
ABI. So the ABI itself was made very simple: just number of pages before
we consider that we're low on memory, and the kernel takes care of the
rest.
Thanks,
--
Anton Vorontsov
Email: cbouatmailru(a)gmail.com
Hi all,
With this support kernel can save functions call chain log into a
persistent ram buffer that can be decoded and dumped after reboot
through pstore filesystem. It can be used to determine what function
was last called before a hang or an unexpected reset (caused by, for
example, a buggy driver that abuses HW).
Here's a "nano howto", to get the idea:
# mount -t debugfs debugfs /sys/kernel/debug/
# cd /sys/kernel/debug/tracing
# echo persistent > current_tracer
# reboot -f
[...]
# mount -t pstore pstore /mnt/
# tail /mnt/ftrace-ramoops
0 ffffffff8101ea64 ffffffff8101bcda native_apic_mem_read <- disconnect_bsp_APIC+0x6a/0xc0
0 ffffffff8101ea44 ffffffff8101bcf6 native_apic_mem_write <- disconnect_bsp_APIC+0x86/0xc0
0 ffffffff81020084 ffffffff8101a4b5 hpet_disable <- native_machine_shutdown+0x75/0x90
0 ffffffff81005f94 ffffffff8101a4bb iommu_shutdown_noop <- native_machine_shutdown+0x7b/0x90
0 ffffffff8101a6a1 ffffffff8101a437 native_machine_emergency_restart <- native_machine_restart+0x37/0x40
0 ffffffff811f9876 ffffffff8101a73a acpi_reboot <- native_machine_emergency_restart+0xaa/0x1e0
0 ffffffff8101a514 ffffffff8101a772 mach_reboot_fixups <- native_machine_emergency_restart+0xe2/0x1e0
0 ffffffff811d9c54 ffffffff8101a7a0 __const_udelay <- native_machine_emergency_restart+0x110/0x1e0
0 ffffffff811d9c34 ffffffff811d9c80 __delay <- __const_udelay+0x30/0x40
0 ffffffff811d9d14 ffffffff811d9c3f delay_tsc <- __delay+0xf/0x20
Mostly the code comes from trace_persistent.c driver found in the
Android git tree, written by Colin Cross <ccross(a)android.com>
(according to sign-off history). I reworked the driver a little bit,
and ported it to pstore subsystem.
The patches depend on a pile of other pstore-related work, so
if anyone is willing to try the patches, they would rather grab
the whole git tree:
git://git.infradead.org/users/cbou/linux-pstore.git
or gitweb:
http://git.infradead.org/users/cbou/linux-pstore.git
Note that so far I've tried to not change the original idea, but if
we consider inclusion, there are some open questions:
1. Should we merge persistent tracer with normal function tracer,
i.e. add some flag that makes function tracer to duplicate the
events into pstore (via a callback to pstore)?
2. If we keep the two separate, should the code live in fs/pstore
or kernel/trace?.. I can see valid points for both approaches.
Thanks,
---
Documentation/ramoops.txt | 24 +++++++++
fs/pstore/Kconfig | 12 +++++
fs/pstore/Makefile | 6 +++
fs/pstore/ftrace.c | 122 ++++++++++++++++++++++++++++++++++++++++++++
fs/pstore/inode.c | 111 ++++++++++++++++++++++++++++++++++++++--
fs/pstore/internal.h | 49 ++++++++++++++++++
fs/pstore/platform.c | 13 ++++-
fs/pstore/ram.c | 54 +++++++++++++++-----
include/linux/pstore.h | 5 ++
include/linux/pstore_ram.h | 1 +
kernel/trace/trace.c | 7 +--
11 files changed, 384 insertions(+), 20 deletions(-)
--
Anton Vorontsov
Email: cbouatmailru(a)gmail.com
Hi all,
There are currently two competing debug facilities to store kernel
messages in a persistent storage: a generic pstore and Google's
persistent_ram. Not so long ago (https://lkml.org/lkml/2012/3/8/252),
it was decided that we should fix this situation.
Recently ramoops has switched to pstore, which basically means that
it became a RAM backend for the pstore framework.
persistent_ram+ram_console and ramoops+pstore have almost the same
features, except:
1. Ramoops doesn't support ECC. Having ECC is useful when a hardware
reset was used to bring the machine back to life (i.e. a watchdog
triggered). In such cases, RAM may be somewhat corrupt, but
usually it is restorable.
2. Pstore doesn't support logging kernel messages in run-time, it only
dumps dmesg when kernel oopses/panics. This makes pstore useless for
debugging hangs caused by HW issues or improper use of HW (e.g.
weird device inserted -> driver tried to write a reserved bits ->
SoC hanged. In that case we don't get any messages in the pstore.
These patches solve the first issue, plus move things to their
proper places. Patches that will fix the second issue are pending.
Thanks,
---
drivers/char/Kconfig | 9 -
drivers/char/Makefile | 1 -
drivers/char/ramoops.c | 362 --------------------
drivers/staging/android/Kconfig | 10 +-
drivers/staging/android/persistent_ram.c | 473 --------------------------
drivers/staging/android/persistent_ram.h | 78 -----
drivers/staging/android/ram_console.c | 2 +-
fs/pstore/Kconfig | 12 +
fs/pstore/Makefile | 1 +
fs/pstore/ram.c | 384 ++++++++++++++++++++++
fs/pstore/ram_core.c | 530 ++++++++++++++++++++++++++++++
include/linux/pstore_ram.h | 99 ++++++
include/linux/ramoops.h | 17 -
13 files changed, 1028 insertions(+), 950 deletions(-)
--
Anton Vorontsov
Email: cbouatmailru(a)gmail.com
Hello,
Its been close to a month now that the kernel CI hwpacks built on
ci.linaro.org with default omap2plus defconfigs
(available in the linux-arm-soc, linux-next, linux-linaro, linux) are
failing to boot on the boards.
Here is an example hwpack built using the linux-linaro tree with omap2plus
defconfig on ci.linaro.org which is failing to boot, in case you want to
try to test it.
http://snapshots.linaro.org/kernel-h
wpack/linux-linaro-tracking/linux-linaro-tracking_panda-omap2plus/hwpack_linaro-lt-panda_20120509-0641_b64_armel_supported.tar.gz<http://snapshots.linaro.org/kernel-hwpack/linux-linaro-tracking/linux-linar…>
I was able to fix the above boot problem of the CI kernel hwpacks built
with omap2plus defconfig
by adding the options similar to the ones available in (linux-linaro tree)
default omap4 defconfig on top of the default
omap2plus defconfig options.
Here is the list of the config options which I added
http://paste.ubuntu.com/977642/ along with the omap2plus
defconfig to make it work.
Can someone help me with the basic omap2plus defconfig options that will be
required to boot the hwpack on the board.
Does anyone care of kernel builds for linux-arm-soc, linux-next,
linux-linaro, linux tree using omap2plus defconfigs ?
Apart from this, there are build failures with the linux-next trees failing
since last 20 days.
https://ci.linaro.org/jenkins/view/Linux%20%28next%29/ lists such failing
build jobs.
For example the following job fails
https://ci.linaro.org/jenkins/view/Linux%20%28next%29/job/linux-next_panda-…
.
Can someone look at these linux-next build failures ?
--
Thanks and Regards,
Deepti
Infrastructure Team Member, Linaro Platform Teams
Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linarohttp://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
Hi all,
This is another resend of several task->mm fixes, the bugs I found
during LMK code audit. Architectures were traverse the tasklist
in an unsafe manner, plus there are a few cases of unsafe access to
task->mm in general.
There were no objections on the previous resend, and the final words
were somewhere along "the patches are fine" line.
In v3:
- Dropped a controversal 'Make find_lock_task_mm() sparse-aware' patch;
- Reword arm and sh commit messages, per Oleg Nesterov's suggestions;
- Added an optimization trick in clear_tasks_mm_cpumask(): take only
the rcu read lock, no need for the whole tasklist_lock.
Suggested by Peter Zijlstra.
In v2:
- introduced a small helper in cpu.c: most arches duplicate the
same [buggy] code snippet, so it's better to fix it and move the
logic into a common function.
Thanks,
--
Anton Vorontsov
Email: cbouatmailru(a)gmail.com
Hi Experts,
I am a newbie in Linaro development.
The kernel built by myself can't boot up. Could you please help on it?
Thanks a lot in advance.
# Proglem:
There was no any response after following logs:
Starting kernel ...
Uncompressing Linux... done, booting the kernel.
*Followings are details:*
# My setting:
Development board - i.Mx53 QSB
Host PC - Ubuntu 10.04 x64
GCC for build linaro kernel - version is 4.6.2
Create a micro-SD card with linaro Nano 12.02. I used,
mx53loco-nano.img.gz<http://releases.linaro.org/12.02/ubuntu/oneiric-images/nano/mx53loco-nano.i…>,
to create the disc and it can work well and I can run my program in this
Nano linaro.
# What I want:
I wanted to build a kernel to replace the kernel installed with linaro Naro
12.02.
# My steps:
1) Download the kernel source code:
I downloaded the code under the linaro 12.02 download page(
http://www.linaro.org/downloads/1202),
linux-linaro-lt-freescale<http://launchpad.net/linaro-landing-team-freescale>(supplied
by landing team).
2) Build the kernel:
I copied the configuration file from micor-SD card,
/media/rootfs/config-3.1.0-1002-linaro-It-mx5, and rename it to .config.
$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- uImage
$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- modules
3) Install kernel to micro-SD card:
$ cp arch/arm/boot/uImage /media/boot
$ make ARCH=arm INSTALL_MOD_PATH=/media/rootfs modules_install
$ sudo umount /media/*
4) Boot with my own kernel:
Install the micro-SD card and press power button.
# Result:
There was no any response after following logs:
Starting kernel ...
Uncompressing Linux... done, booting the kernel.
I have tried several other version kernel code and even update u-boot, but
failed to fix it.
Could somebody show me suggestion on it?
Best Regards,
Tim