(If you reply, reply to this one, not the previous message I sent, this one
fixes the linaro-dev email address)
On Mon, Jul 11, 2011 at 4:35 AM, Zygmunt Krynicki <
zygmunt.krynicki(a)linaro.org> wrote:
> In short: ~zygaN is the thing we can increment. We should KEEP and perhaps
> change the name to ~lava (but this has to be coordinated as ~lava < ~zyga.
> There are three possible scenarios which this system correctly handles:
>
> Right, but we can make that change anytime the upstream version gets
bumped. So if it has an upstream version component that gets bumped, then
feel free to change over to the ~lava designation as soon as you make a
release that bumps the upstream version. Otherwise, if it's a component
that uses YYYY.MM <http://yyyy.mm/> only, then we will wait until the
2011.07 release later this month to switch.
-Paul Larson
Hi Paul & Zygmunt (& others),
I spent a while today fixing a couple of bugs in lava-tool and in the
packaging of lava-server, and was wondering what the process should be
for getting them into the ~linaro-validation ppa and onto v.l.o
(although there's no particular urgency in getting these precise fixes
deployed, there will be changes that are more urgent).
In some sense these are basic debian packaging questions I guess. But
my understanding of the process is that it should go like this:
If there are upstream changes, make a new release (update the version in
__init__.py, tag the branch, make an sdist and upload it to pypi).
Then (whether there is an upstream change or not) it should be uploaded
to a PPA. I think the part here that I don't really get is basically
how to use bzr build-deb in practice. But I've just found
http://jameswestby.net/bzr/builddeb/user_manual/merge.html so I think I
should read that first :)
Another question I have is around version numbers. Currently we're
using version numbers like 0.2-0ubuntu0~zyga1. I don't really see why
the "zyga" is in there :) I think simply dropping the zyga and using
versions like 0.2-0ubuntu0~1 would be fine, or if we want to know who
uploaded a particular version we can use things like
0.2-0ubuntu0~2mwhudson.
Finally, I think that we should be triggerhappy about releases, and so
going through the above process shouldn't take very long. I guess
lava-dev-tool can help here.
Cheers,
mwh
Documentation about the background and the design of mmc non-blocking.
Host driver guidelines to minimize request preparation overhead.
Signed-off-by: Per Forlin <per.forlin(a)linaro.org>
Acked-by: Randy Dunlap <rdunlap(a)xenotime.net>
---
ChangeLog:
v2: - Minor updates after proofreading comments from Chris
v3: - Minor updates after more comments from Chris
v4: - Minor updates after comments from Randy
v5: - Fixed one more comment and Acked-by from Randy
Documentation/mmc/00-INDEX | 2 +
Documentation/mmc/mmc-async-req.txt | 86 +++++++++++++++++++++++++++++++++++
2 files changed, 88 insertions(+), 0 deletions(-)
create mode 100644 Documentation/mmc/mmc-async-req.txt
diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX
index 93dd7a7..a9ba672 100644
--- a/Documentation/mmc/00-INDEX
+++ b/Documentation/mmc/00-INDEX
@@ -4,3 +4,5 @@ mmc-dev-attrs.txt
- info on SD and MMC device attributes
mmc-dev-parts.txt
- info on SD and MMC device partitions
+mmc-async-req.txt
+ - info on mmc asynchronous requests
diff --git a/Documentation/mmc/mmc-async-req.txt b/Documentation/mmc/mmc-async-req.txt
new file mode 100644
index 0000000..b7a52ea
--- /dev/null
+++ b/Documentation/mmc/mmc-async-req.txt
@@ -0,0 +1,86 @@
+Rationale
+=========
+
+How significant is the cache maintenance overhead?
+It depends. Fast eMMC and multiple cache levels with speculative cache
+pre-fetch makes the cache overhead relatively significant. If the DMA
+preparations for the next request are done in parallel with the current
+transfer, the DMA preparation overhead would not affect the MMC performance.
+The intention of non-blocking (asynchronous) MMC requests is to minimize the
+time between when an MMC request ends and another MMC request begins.
+Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
+dma_unmap_sg are processing. Using non-blocking MMC requests makes it
+possible to prepare the caches for next job in parallel with an active
+MMC request.
+
+MMC block driver
+================
+
+The issue_rw_rq() in the MMC block driver is made non-blocking.
+The increase in throughput is proportional to the time it takes to
+prepare (major part of preparations are dma_map_sg and dma_unmap_sg)
+a request and how fast the memory is. The faster the MMC/SD is
+the more significant the prepare request time becomes. Roughly the expected
+performance gain is 5% for large writes and 10% on large reads on a L2 cache
+platform. In power save mode, when clocks run on a lower frequency, the DMA
+preparation may cost even more. As long as these slower preparations are run
+in parallel with the transfer performance won't be affected.
+
+Details on measurements from IOZone and mmc_test
+================================================
+
+https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
+
+MMC core API extension
+======================
+
+There is one new public function mmc_start_req().
+It starts a new MMC command request for a host. The function isn't
+truly non-blocking. If there is on ongoing async request it waits
+for completion of that request and starts the new one and returns. It
+doesn't wait for the new request to complete. If there is no ongoing
+request it starts the new request and returns immediately.
+
+MMC host extensions
+===================
+
+There are two optional hooks -- pre_req() and post_req() -- that the host
+driver may implement in order to move work to before and after the actual
+mmc_request function is called. In the DMA case pre_req() may do
+dma_map_sg() and prepare the DMA descriptor, and post_req runs
+the dma_unmap_sg.
+
+Optimize for the first request
+==============================
+
+The first request in a series of requests can't be prepared in parallel with
+the previous transfer, since there is no previous request.
+The argument is_first_req in pre_req() indicates that there is no previous
+request. The host driver may optimize for this scenario to minimize
+the performance loss. A way to optimize for this is to split the current
+request in two chunks, prepare the first chunk and start the request,
+and finally prepare the second chunk and start the transfer.
+
+Pseudocode to handle is_first_req scenario with minimal prepare overhead:
+if (is_first_req && req->size > threshold)
+ /* start MMC transfer for the complete transfer size */
+ mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
+
+ /*
+ * Begin to prepare DMA while cmd is being processed by MMC.
+ * The first chunk of the request should take the same time
+ * to prepare as the "MMC process command time".
+ * If prepare time exceeds MMC cmd time
+ * the transfer is delayed, guesstimate max 4k as first chunk size.
+ */
+ prepare_1st_chunk_for_dma(req);
+ /* flush pending desc to the DMAC (dmaengine.h) */
+ dma_issue_pending(req->dma_desc);
+
+ prepare_2nd_chunk_for_dma(req);
+ /*
+ * The second issue_pending should be called before MMC runs out
+ * of the first chunk. If the MMC runs out of the first data chunk
+ * before this call, the transfer is delayed.
+ */
+ dma_issue_pending(req->dma_desc);
--
1.7.4.1
Hi All,
I've taken some time to freshen up the libjpeg-turbo hosted at
lp:libjpeg-turbo to be refreshed with the upstream 1.1.1 plus Mandeeps
previous work. I thought as part of this effort I'd do some
performance comparisons between this new cut of code and the much
older non arm optimized libjpeg62.
For this test, I selected 3 ppms and 3 jpegs of the same image. The
images are all color and are of the sizes : 1920x1024, 406x357 and
7227x3847. I placed the images inside of a ramdisk as well as would
run the cjpeg and djpeg applications with time. Both cjpeg and djpeg
and statically linked. I believe the libjpeg62 and libjpeg-turbo code
using the current 4.5.2 version of gcc using the 0706 panda hwpack and
LEB.
Note, while these numbers are interesting to a point, time is I
believe the wrong tool to make this measurement. Instead the call to
the jpeg code should have time stamps before and after the operation
to get a better idea of the time used.
Numbers:
cjpeg (ppm to jpeg conversion default parms)
image size libjpeg62 libjpeg-turbo
improvement (positive numbers are better)
1920x1024 real 0m0.522s real 0m0.452s +0.070s
user 0m0.438s user 0m0.352s
+0.086s
sys 0m0.055s sys 0m0.070s -0.015s
406x357 real 0m0.086s real 0m0.085s +0.001s
user 0m0.047s user 0m0.047s 0s
sys 0m0.008s sys 0m0.008s 0s
7227x3847 real 0m3.812s real 0m4.075s - 0.263s
user 0m3.461s user 0m3.711s
- 0.250s
sys 0m0.305s sys 0m0.328s
- 0.023s
djpeg (jpeg to ppm conversion default parms)
image size libjpeg62 libjpeg-turbo
improvement
1920x1024 real 0m0.414s real 0m0.294s +0.120s
user 0m0.273s user 0m0.156s +0.117s
sys 0m0.109s sys 0m0.109s 0s
406x357 real 0m0.079s real 0m0.072s +0.007s
user 0m0.023s user 0m0.016s
+0.007s
sys 0m0.023s sys 0m0.023s 0s
7227x3847 real 0m7.517s real 0m10.099s -2.582s
user 0m2.047s user 0m1.242s +0.805s
sys 0m0.813s sys 0m0.859s -0.046s
Further work:
There are some additional patches from Michael Edwards, those were not
part of this measurement tho I will be picking his patches up.
Better timestamp code instead of using time.
--
Regards,
Tom
"We want great men who, when fortune frowns will not be discouraged."
- Colonel Henry Knox
Linaro.org │ Open source software for ARM SoCs
w) tom.gall att linaro.org
w) tom_gall att vnet.ibm.com
h) tom_gall att mac.com
Last meeting minutes:
https://wiki.linaro.org/WorkingGroups/Middleware/Multimedia/Notes/2011-07-05
Current status
report:https://wiki.linaro.org/WorkingGroups/Middleware/Multimedia/WeeklyRe…
Highlights:
Issues/Changes: Mandeep Kumar is not contributing to Linaro work. A
suitable replacement is sought after with the member company.
As a plan-B Ilias has asked another member of the team to look after
JPEG-turbo for 1107 release, providing the release and any necessary
highlights to release management. This plan should be confirmed with
Kurt once he is back from vac. Info has already been shared with the
team member picking up jpeg-turbo
As a result of this change sending the public plan review presentation
to the TSC was postponed till Kurt has a new version of the plan.
Public plan review: this is still planned to take place on 13 or 14 of
July. Final date will be determined next week.
The blueprints for the work now look to be mostly in shape. Still to cover
- Userspace UMM - robclark needs to clarify whether any userspace API
related to buffer sharing needs a separate blueprint or it can be as a
set of work items in the existing memory management blueprints.
- Jpeg optimization related to codec optimization - due to the change
mentioned above this will need to be revised to identify how to meet the
1107 and beyond targets
- As a result of the change the team will doublecheck other
blueprints for any timeline adjustments needed
--
Ilias Biris ilias.biris(a)linaro.org
Project Manager, Linaro
M: +358504839608, IRC: ibiris Skype: ilias_biris
Linaro.org│ Open source software for ARM SoCs
Enclosed please find the link to the Weekly Status report
for the kernel working group for the week ending 2011-07-08.
== Weekly Status Report ==
https://wiki.linaro.org/WorkingGroups/Kernel/Status/2011-07-07
== Summary ==
* Updated linaro+android tree to linaro 11.06 final kernel
* Worked on testing Android Alarm Timers integration with Posix Alarm
Timers.
* Rebased mmc non-blocking patchset v8 on to of mmc-next for 3.1.
* Merged only one patch to linaro-2.6.39 fixing a bug in the MMC test
module.
* Started an experimental merge of Catalin Marinas' LPAE (ARM Large
Physical Address Extension)
patches with linux-linaro-2.6.39. Have a booting kernel on the ARM
VExpress-CA15x4 model making
use of physical memory beyond 4GB.
Regards,
Mounir
--
Mounir Bsaibes
Project Manager
Follow Linaro.org:
facebook.com/pages/Linaro/155974581091106http://twitter.com/#!/linaroorghttp://www.linaro.org/linaro-blog <http://www.linaro.org/linaro-blog>