[PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

List overview All Threads
Download

newer

older

[PATCH v8 00/12] use nonblock mmc...

Call for desktop/graphics/mobile...

Per Forlin

19 Jun 2011 19 Jun '11

9:17 p.m.

How significant is the cache maintenance over head? It depends, the eMMC are much faster now compared to a few years ago and cache maintenance cost more due to multiple cache levels and speculative cache pre-fetch. In relation the cost for handling the caches have increased and is now a bottle neck dealing with fast eMMC together with DMA.

The intention for introducing non-blocking mmc requests is to minimize the time between a mmc request ends and another mmc request starts. In the current implementation the MMC controller is idle when dma_map_sg and dma_unmap_sg is processing. Introducing non-blocking mmc request makes it possible to prepare the caches for next job in parallel to an active mmc request.

This is done by making the issue_rw_rq() non-blocking. The increase in throughput is proportional to the time it takes to prepare (major part of preparations is dma_map_sg and dma_unmap_sg) a request and how fast the memory is. The faster the MMC/SD is the more significant the prepare request time becomes. Measurements on U5500 and Panda on eMMC and SD shows significant performance gain for large reads when running DMA mode. In the PIO case the performance is unchanged.

There are two optional hooks pre_req() and post_req() that the host driver may implement in order to move work to before and after the actual mmc_request function is called. In the DMA case pre_req() may do dma_map_sg() and prepare the dma descriptor and post_req runs the dma_unmap_sg.

Details on measurements from IOZone and mmc_test: https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req

Changes since v5: * Fix spelling mistakes, replace "none blocking" with non-blocking. * excluded patch "omap_hsmmc: use original sg_len..." since it is being merged separately.

Per Forlin (11): mmc: add non-blocking mmc request function omap_hsmmc: add support for pre_req and post_req mmci: implement pre_req() and post_req() mmc: mmc_test: add debugfs file to list all tests mmc: mmc_test: add test for non-blocking transfers mmc: add member in mmc queue struct to hold request data mmc: add a block request prepare function mmc: move error code in mmc_block_issue_rw_rq to a separate function. mmc: add a second mmc queue request member mmc: test: add random fault injection in core.c mmc: add handling for two parallel block requests in issue_rw_rq

-- 1.7.4.1

Show replies by date

Per Forlin

19 Jun 19 Jun

9:17 p.m.

New subject: [PATCH v6 01/11] mmc: add non-blocking mmc request function

Previously there has only been one function mmc_wait_for_req() to start and wait for a request. This patch adds * mmc_start_req() - starts a request wihtout waiting If there is on ongoing request wait for completion of that request and start the new one and return. Does not wait for the new command to complete.

This patch also adds new function members in struct mmc_host_ops only called from core.c * pre_req - asks the host driver to prepare for the next job * post_req - asks the host driver to clean up after a completed job

The intention is to use pre_req() and post_req() to do cache maintenance while a request is active. pre_req() can be called while a request is active to minimize latency to start next job. post_req() can be used after the next job is started to clean up the request. This will minimize the host driver request end latency. post_req() is typically used before ending the block request and handing over the buffer to the block layer.

Add a host-private member in mmc_data to be used by pre_req to mark the data. The host driver will then check this mark to see if the data is prepared or not.

Signed-off-by: Per Forlin per.forlin@linaro.org --- drivers/mmc/core/core.c | 111 +++++++++++++++++++++++++++++++++++++++++---- include/linux/mmc/core.h | 6 ++- include/linux/mmc/host.h | 21 +++++++++ 3 files changed, 127 insertions(+), 11 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 68091dd..331c69e 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -198,10 +198,108 @@ mmc_start_request(struct mmc_host *host, struct mmc_request *mrq)

static void mmc_wait_done(struct mmc_request *mrq) { - complete(mrq->done_data); + complete(&mrq->completion); +} + +static void __mmc_start_req(struct mmc_host *host, struct mmc_request *mrq) +{ + init_completion(&mrq->completion); + mrq->done = mmc_wait_done; + mmc_start_request(host, mrq); +} + +static void mmc_wait_for_req_done(struct mmc_host *host, + struct mmc_request *mrq) +{ + wait_for_completion(&mrq->completion); +} + +/** + * mmc_pre_req - Prepare for a new request + * @host: MMC host to prepare command + * @mrq: MMC request to prepare for + * @is_first_req: true if there is no previous started request + * that may run in parellel to this call, otherwise false + * + * mmc_pre_req() is called in prior to mmc_start_req() to let + * host prepare for the new request. Preparation of a request may be + * performed while another request is running on the host. + */ +static void mmc_pre_req(struct mmc_host *host, struct mmc_request *mrq, + bool is_first_req) +{ + if (host->ops->pre_req) + host->ops->pre_req(host, mrq, is_first_req); }

/** + * mmc_post_req - Post process a completed request + * @host: MMC host to post process command + * @mrq: MMC request to post process for + * @err: Error, if non zero, clean up any resources made in pre_req + * + * Let the host post process a completed request. Post processing of + * a request may be performed while another reuqest is running. + */ +static void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq, + int err) +{ + if (host->ops->post_req) + host->ops->post_req(host, mrq, err); +} + +/** + * mmc_start_req - start a non-blocking request + * @host: MMC host to start command + * @areq: async request to start + * @error: non zero in case of error + * + * Start a new MMC custom command request for a host. + * If there is on ongoing async request wait for completion + * of that request and start the new one and return. + * Does not wait for the new request to complete. + * + * Returns the completed async request, NULL in case of none completed. + */ +struct mmc_async_req *mmc_start_req(struct mmc_host *host, + struct mmc_async_req *areq, int *error) +{ + int err = 0; + struct mmc_async_req *data = host->areq; + + /* Prepare a new request */ + if (areq) + mmc_pre_req(host, areq->mrq, !host->areq); + + if (host->areq) { + mmc_wait_for_req_done(host, host->areq->mrq); + err = host->areq->err_check(host->card, host->areq); + if (err) { + mmc_post_req(host, host->areq->mrq, 0); + if (areq) + mmc_post_req(host, areq->mrq, -EINVAL); + + host->areq = NULL; + if (error) + *error = err; + return data; + } + } + + if (areq) + __mmc_start_req(host, areq->mrq); + + if (host->areq) + mmc_post_req(host, host->areq->mrq, 0); + + host->areq = areq; + if (error) + *error = err; + return data; +} +EXPORT_SYMBOL(mmc_start_req); + +/** * mmc_wait_for_req - start a request and wait for completion * @host: MMC host to start command * @mrq: MMC request to start @@ -212,16 +310,9 @@ static void mmc_wait_done(struct mmc_request *mrq) */ void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq) { - DECLARE_COMPLETION_ONSTACK(complete); - - mrq->done_data = &complete; - mrq->done = mmc_wait_done; - - mmc_start_request(host, mrq); - - wait_for_completion(&complete); + __mmc_start_req(host, mrq); + mmc_wait_for_req_done(host, mrq); } - EXPORT_SYMBOL(mmc_wait_for_req);

/** diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h index 791f060..57ee19e 100644 --- a/include/linux/mmc/core.h +++ b/include/linux/mmc/core.h @@ -117,6 +117,7 @@ struct mmc_data {

unsigned int sg_len; /* size of scatter list */ struct scatterlist *sg; /* I/O scatter list */ + s32 host_cookie; /* host private data */ };

struct mmc_request { @@ -125,13 +126,16 @@ struct mmc_request { struct mmc_data *data; struct mmc_command *stop;

- void *done_data; /* completion data */ + struct completion completion; void (*done)(struct mmc_request *);/* completion function */ };

struct mmc_host; struct mmc_card; +struct mmc_async_req;

+extern struct mmc_async_req *mmc_start_req(struct mmc_host *, + struct mmc_async_req *, int *); extern void mmc_wait_for_req(struct mmc_host *, struct mmc_request *); extern int mmc_wait_for_cmd(struct mmc_host *, struct mmc_command *, int); extern int mmc_app_cmd(struct mmc_host *, struct mmc_card *); diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index ac3fbac..59db6f2 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -106,6 +106,15 @@ struct mmc_host_ops { */ int (*enable)(struct mmc_host *host); int (*disable)(struct mmc_host *host, int lazy); + /* + * It is optional for the host to implement pre_req and post_req in + * order to support double buffering of requests (prepare one + * request while another request is active). + */ + void (*post_req)(struct mmc_host *host, struct mmc_request *req, + int err); + void (*pre_req)(struct mmc_host *host, struct mmc_request *req, + bool is_first_req); void (*request)(struct mmc_host *host, struct mmc_request *req); /* * Avoid calling these three functions too often or in a "fast path", @@ -144,6 +153,16 @@ struct mmc_host_ops { struct mmc_card; struct device;

+struct mmc_async_req { + /* active mmc request */ + struct mmc_request *mrq; + /* + * Check error status of completed mmc request. + * Returns 0 if success otherwise non zero. + */ + int (*err_check) (struct mmc_card *, struct mmc_async_req *); +}; + struct mmc_host { struct device *parent; struct device class_dev; @@ -280,6 +299,8 @@ struct mmc_host {

struct dentry *debugfs_root;

+ struct mmc_async_req *areq; /* active async req */ + unsigned long private[0] ____cacheline_aligned; };

-- 1.7.4.1

Per Forlin

9:17 p.m.

New subject: [PATCH v6 02/11] omap_hsmmc: add support for pre_req and post_req

pre_req() runs dma_map_sg(), post_req() runs dma_unmap_sg. If not calling pre_req() before omap_hsmmc_request() dma_map_sg will be issued before starting the transfer. It is optional to use pre_req(). If issuing pre_req() post_req() must be to be called as well.

Signed-off-by: Per Forlin per.forlin@linaro.org --- drivers/mmc/host/omap_hsmmc.c | 87 +++++++++++++++++++++++++++++++++++++++-- 1 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c index cd317af..b29ca90 100644 --- a/drivers/mmc/host/omap_hsmmc.c +++ b/drivers/mmc/host/omap_hsmmc.c @@ -141,6 +141,11 @@ #define OMAP_HSMMC_WRITE(base, reg, val) \ __raw_writel((val), (base) + OMAP_HSMMC_##reg)

+struct omap_hsmmc_next { + unsigned int dma_len; + s32 cookie; +}; + struct omap_hsmmc_host { struct device *dev; struct mmc_host *mmc; @@ -184,6 +189,7 @@ struct omap_hsmmc_host { int reqs_blocked; int use_reg; int req_in_progress; + struct omap_hsmmc_next next_data;

struct omap_mmc_platform_data *pdata; }; @@ -1343,8 +1349,9 @@ static void omap_hsmmc_dma_cb(int lch, u16 ch_status, void *cb_data) return; }

- dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len, - omap_hsmmc_get_dma_dir(host, data)); + if (!data->host_cookie) + dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len, + omap_hsmmc_get_dma_dir(host, data));

req_in_progress = host->req_in_progress; dma_ch = host->dma_ch; @@ -1362,6 +1369,45 @@ static void omap_hsmmc_dma_cb(int lch, u16 ch_status, void *cb_data) } }

+static int omap_hsmmc_pre_dma_transfer(struct omap_hsmmc_host *host, + struct mmc_data *data, + struct omap_hsmmc_next *next) +{ + int dma_len; + + if (!next && data->host_cookie && + data->host_cookie != host->next_data.cookie) { + printk(KERN_WARNING "[%s] invalid cookie: data->host_cookie %d" + " host->next_data.cookie %d\n", + __func__, data->host_cookie, host->next_data.cookie); + data->host_cookie = 0; + } + + /* Check if next job is already prepared */ + if (next || + (!next && data->host_cookie != host->next_data.cookie)) { + dma_len = dma_map_sg(mmc_dev(host->mmc), data->sg, + data->sg_len, + omap_hsmmc_get_dma_dir(host, data)); + + } else { + dma_len = host->next_data.dma_len; + host->next_data.dma_len = 0; + } + + + if (dma_len == 0) + return -EINVAL; + + if (next) { + next->dma_len = dma_len; + data->host_cookie = ++next->cookie < 0 ? 1 : next->cookie; + } else + host->dma_len = dma_len; + + return 0; +} + /* * Routine to configure and start DMA for the MMC card */ @@ -1395,9 +1441,10 @@ static int omap_hsmmc_start_dma_transfer(struct omap_hsmmc_host *host, mmc_hostname(host->mmc), ret); return ret; } + ret = omap_hsmmc_pre_dma_transfer(host, data, NULL); + if (ret) + return ret;

- host->dma_len = dma_map_sg(mmc_dev(host->mmc), data->sg, - data->sg_len, omap_hsmmc_get_dma_dir(host, data)); host->dma_ch = dma_ch; host->dma_sg_idx = 0;

@@ -1477,6 +1524,35 @@ omap_hsmmc_prepare_data(struct omap_hsmmc_host *host, struct mmc_request *req) return 0; }

+static void omap_hsmmc_post_req(struct mmc_host *mmc, struct mmc_request *mrq, + int err) +{ + struct omap_hsmmc_host *host = mmc_priv(mmc); + struct mmc_data *data = mrq->data; + + if (host->use_dma) { + dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len, + omap_hsmmc_get_dma_dir(host, data)); + data->host_cookie = 0; + } +} + +static void omap_hsmmc_pre_req(struct mmc_host *mmc, struct mmc_request *mrq, + bool is_first_req) +{ + struct omap_hsmmc_host *host = mmc_priv(mmc); + + if (mrq->data->host_cookie) { + mrq->data->host_cookie = 0; + return ; + } + + if (host->use_dma) + if (omap_hsmmc_pre_dma_transfer(host, mrq->data, + &host->next_data)) + mrq->data->host_cookie = 0; +} + /* * Request function. for read/write operation */ @@ -1925,6 +2001,8 @@ static int omap_hsmmc_disable_fclk(struct mmc_host *mmc, int lazy) static const struct mmc_host_ops omap_hsmmc_ops = { .enable = omap_hsmmc_enable_fclk, .disable = omap_hsmmc_disable_fclk, + .post_req = omap_hsmmc_post_req, + .pre_req = omap_hsmmc_pre_req, .request = omap_hsmmc_request, .set_ios = omap_hsmmc_set_ios, .get_cd = omap_hsmmc_get_cd, @@ -2074,6 +2152,7 @@ static int __init omap_hsmmc_probe(struct platform_device *pdev) host->mapbase = res->start; host->base = ioremap(host->mapbase, SZ_4K); host->power_mode = MMC_POWER_OFF; + host->next_data.cookie = 1;

platform_set_drvdata(pdev, host); INIT_WORK(&host->mmc_carddetect_work, omap_hsmmc_detect);

-- 1.7.4.1

Kishore Kadiyala

21 Jun 21 Jun

5:41 a.m.

New subject: [PATCH v6 02/11] omap_hsmmc: add support for pre_req and post_req

<snip>

...

+static void omap_hsmmc_pre_req(struct mmc_host *mmc, struct mmc_request *mrq,

bool is_first_req)

I don't see the usage of "is_first_req" below. Is it required?

...

+{

struct omap_hsmmc_host *host = mmc_priv(mmc);

if (mrq->data->host_cookie) {

mrq->data->host_cookie = 0;

return ;

}

if (host->use_dma)

if (omap_hsmmc_pre_dma_transfer(host, mrq->data,

&host->next_data))

mrq->data->host_cookie = 0;

+}

/*

Regards, Kishore

Per Forlin

6:51 a.m.

New subject: [PATCH v6 02/11] omap_hsmmc: add support for pre_req and post_req

On 21 June 2011 07:41, Kishore Kadiyala kishorek.kadiyala@gmail.com wrote:

...

<snip>

...

+static void omap_hsmmc_pre_req(struct mmc_host *mmc, struct mmc_request *mrq,

bool is_first_req)

I don't see the usage of "is_first_req" below. Is it required?

It is not required. It is only an indication that this request is the first in a series of request. The host driver may do various optimisations based on this information. The first request in a series of jobs can't be prepared in parallel to the previous job. The host driver can do the following to minimise latency for the first job. * Preparing the cache while the MMC read/write cmd is being processed. In this case the pre_req could do nothing and the job is instead run in parallel to the read/write cmd being sent. If the is_first_req is false pre_req will run in parallel to an active transfer, in this case it is more efficient to prepare the request in pre_req. * Run PIO mode instead of DMA * Maybe there can be power related optimisations based on if it is one single transfer or multiple ones.

...

...
+{

struct omap_hsmmc_host *host = mmc_priv(mmc);

if (mrq->data->host_cookie) {

mrq->data->host_cookie = 0;

return ;

}

if (host->use_dma)

if (omap_hsmmc_pre_dma_transfer(host, mrq->data,

&host->next_data))

mrq->data->host_cookie = 0;

+}

/*

Thanks for your comments, Per

Kishore Kadiyala

1:56 p.m.

New subject: [PATCH v6 02/11] omap_hsmmc: add support for pre_req and post_req

Hi Per,

On Tue, Jun 21, 2011 at 12:21 PM, Per Forlin per.forlin@linaro.org wrote:

...

On 21 June 2011 07:41, Kishore Kadiyala kishorek.kadiyala@gmail.com wrote:

...
<snip>

...

+static void omap_hsmmc_pre_req(struct mmc_host *mmc, struct mmc_request *mrq,

bool is_first_req)

I don't see the usage of "is_first_req" below. Is it required?

It is not required. It is only an indication that this request is the first in a series of request. The host driver may do various optimisations based on this information. The first request in a series of jobs can't be prepared in parallel to the previous job. The host driver can do the following to minimise latency for the first job. * Preparing the cache while the MMC read/write cmd is being processed. In this case the pre_req could do nothing and the job is instead run in parallel to the read/write cmd being sent. If the is_first_req is false pre_req will run in parallel to an active transfer, in this case it is more efficient to prepare the request in pre_req. * Run PIO mode instead of DMA * Maybe there can be power related optimisations based on if it is one single transfer or multiple ones.

Ok, thanks for making things clear.

<snip>

Regards, Kishore

Nicolas Pitre

7:18 p.m.

New subject: [PATCH v6 02/11] omap_hsmmc: add support for pre_req and post_req

On Tue, 21 Jun 2011, Per Forlin wrote:

...

On 21 June 2011 07:41, Kishore Kadiyala kishorek.kadiyala@gmail.com wrote:

...
<snip>

...

+static void omap_hsmmc_pre_req(struct mmc_host *mmc, struct mmc_request *mrq,

bool is_first_req)

I don't see the usage of "is_first_req" below. Is it required?

It is not required. It is only an indication that this request is the first in a series of request. The host driver may do various optimisations based on this information. The first request in a series of jobs can't be prepared in parallel to the previous job. The host driver can do the following to minimise latency for the first job.

Preparing the cache while the MMC read/write cmd is being

processed. In this case the pre_req could do nothing and the job is instead run in parallel to the read/write cmd being sent. If the is_first_req is false pre_req will run in parallel to an active transfer, in this case it is more efficient to prepare the request in pre_req.

Run PIO mode instead of DMA

That is never going to be a good tradeoff. If the CPU is busy doing PIO, it won't have a chance to prepare a subsequent request in parallel to the first transfer.

Nicolas

Per Forlin

8:18 p.m.

New subject: [PATCH v6 02/11] omap_hsmmc: add support for pre_req and post_req

On 21 June 2011 21:18, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...

On Tue, 21 Jun 2011, Per Forlin wrote:

...
On 21 June 2011 07:41, Kishore Kadiyala kishorek.kadiyala@gmail.com wrote:

...
<snip>

...

+static void omap_hsmmc_pre_req(struct mmc_host *mmc, struct mmc_request *mrq,

bool is_first_req)

I don't see the usage of "is_first_req" below. Is it required?

It is not required. It is only an indication that this request is the first in a series of request. The host driver may do various optimisations based on this information. The first request in a series of jobs can't be prepared in parallel to the previous job. The host driver can do the following to minimise latency for the first job. * Preparing the cache while the MMC read/write cmd is being processed. In this case the pre_req could do nothing and the job is instead run in parallel to the read/write cmd being sent. If the is_first_req is false pre_req will run in parallel to an active transfer, in this case it is more efficient to prepare the request in pre_req. * Run PIO mode instead of DMA

That is never going to be a good tradeoff. If the CPU is busy doing PIO, it won't have a chance to prepare a subsequent request in parallel to the first transfer.

If you have two CPUs and the MMC interrupts are scheduled on the CPU 1, CPU 0 can prepare the next one. I'm still in favor of the preparing cache in parallel to cmd. I have run tests and for small req like 4k there is a good performance gain. Another option, if the mmc controller support it, would be to start with PIO and switch to DMA on the fly when cache is ready. Bottom line, it is up to the host driver to do something clever based on is_first_req.

/Per

Nicolas Pitre

8:29 p.m.

New subject: [PATCH v6 02/11] omap_hsmmc: add support for pre_req and post_req

On Tue, 21 Jun 2011, Per Forlin wrote:

...

On 21 June 2011 21:18, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
On Tue, 21 Jun 2011, Per Forlin wrote:

...
On 21 June 2011 07:41, Kishore Kadiyala kishorek.kadiyala@gmail.com wrote:

...
<snip>

...

+static void omap_hsmmc_pre_req(struct mmc_host *mmc, struct mmc_request *mrq,

bool is_first_req)

I don't see the usage of "is_first_req" below. Is it required?

It is not required. It is only an indication that this request is the first in a series of request. The host driver may do various optimisations based on this information. The first request in a series of jobs can't be prepared in parallel to the previous job. The host driver can do the following to minimise latency for the first job. * Preparing the cache while the MMC read/write cmd is being processed. In this case the pre_req could do nothing and the job is instead run in parallel to the read/write cmd being sent. If the is_first_req is false pre_req will run in parallel to an active transfer, in this case it is more efficient to prepare the request in pre_req. * Run PIO mode instead of DMA

That is never going to be a good tradeoff. If the CPU is busy doing PIO, it won't have a chance to prepare a subsequent request in parallel to the first transfer.

If you have two CPUs and the MMC interrupts are scheduled on the CPU 1, CPU 0 can prepare the next one.

Well, it is true that in theory the PIO operation shouldn't take all the CPU anyway, so maybe there are some cycles left in between FIFO interrupts.

The danger here is of course to be presented with a trickle of single requests. Doing them all with PIO is going to waste more power or prevent other tasks from running with 100% CPU which might impact the system latency more than the latency we're trying to avoid here.

In other words this is something that should be evaluated and not applied freely.

Nicolas

Per Forlin

19 Jun 19 Jun

9:17 p.m.

New subject: [PATCH v6 03/11] mmci: implement pre_req() and post_req()

pre_req() runs dma_map_sg() and prepares the dma descriptor for the next mmc data transfer. post_req() runs dma_unmap_sg. If not calling pre_req() before mmci_request(), mmci_request() will prepare the cache and dma just like it did it before. It is optional to use pre_req() and post_req() for mmci.

Signed-off-by: Per Forlin per.forlin@linaro.org --- drivers/mmc/host/mmci.c | 146 ++++++++++++++++++++++++++++++++++++++++++---- drivers/mmc/host/mmci.h | 8 +++ 2 files changed, 141 insertions(+), 13 deletions(-)

diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c index b4a7e4f..985c77d 100644 --- a/drivers/mmc/host/mmci.c +++ b/drivers/mmc/host/mmci.c @@ -320,7 +320,8 @@ static void mmci_dma_unmap(struct mmci_host *host, struct mmc_data *data) dir = DMA_FROM_DEVICE; }

- dma_unmap_sg(chan->device->dev, data->sg, data->sg_len, dir); + if (!data->host_cookie) + dma_unmap_sg(chan->device->dev, data->sg, data->sg_len, dir);

/* * Use of DMA with scatter-gather is impossible. @@ -338,7 +339,8 @@ static void mmci_dma_data_error(struct mmci_host *host) dmaengine_terminate_all(host->dma_current); }

-static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl) +static int mmci_dma_prep_data(struct mmci_host *host, struct mmc_data *data, + struct mmci_host_next *next) { struct variant_data *variant = host->variant; struct dma_slave_config conf = { @@ -349,13 +351,20 @@ static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl) .src_maxburst = variant->fifohalfsize >> 2, /* # of words */ .dst_maxburst = variant->fifohalfsize >> 2, /* # of words */ }; - struct mmc_data *data = host->data; struct dma_chan *chan; struct dma_device *device; struct dma_async_tx_descriptor *desc; int nr_sg;

- host->dma_current = NULL; + /* Check if next job is already prepared */ + if (data->host_cookie && !next && + host->dma_current && host->dma_desc_current) + return 0; + + if (!next) { + host->dma_current = NULL; + host->dma_desc_current = NULL; + }

if (data->flags & MMC_DATA_READ) { conf.direction = DMA_FROM_DEVICE; @@ -370,7 +379,7 @@ static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl) return -EINVAL;

/* If less than or equal to the fifo size, don't bother with DMA */ - if (host->size <= variant->fifosize) + if (data->blksz * data->blocks <= variant->fifosize) return -EINVAL;

device = chan->device; @@ -384,14 +393,38 @@ static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl) if (!desc) goto unmap_exit;

- /* Okay, go for it. */ - host->dma_current = chan; + if (next) { + next->dma_chan = chan; + next->dma_desc = desc; + } else { + host->dma_current = chan; + host->dma_desc_current = desc; + } + + return 0;

+ unmap_exit: + if (!next) + dmaengine_terminate_all(chan); + dma_unmap_sg(device->dev, data->sg, data->sg_len, conf.direction); + return -ENOMEM; +} + +static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl) +{ + int ret; + struct mmc_data *data = host->data; + + ret = mmci_dma_prep_data(host, host->data, NULL); + if (ret) + return ret; + + /* Okay, go for it. */ dev_vdbg(mmc_dev(host->mmc), "Submit MMCI DMA job, sglen %d blksz %04x blks %04x flags %08x\n", data->sg_len, data->blksz, data->blocks, data->flags); - dmaengine_submit(desc); - dma_async_issue_pending(chan); + dmaengine_submit(host->dma_desc_current); + dma_async_issue_pending(host->dma_current);

datactrl |= MCI_DPSM_DMAENABLE;

@@ -406,14 +439,90 @@ static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl) writel(readl(host->base + MMCIMASK0) | MCI_DATAENDMASK, host->base + MMCIMASK0); return 0; +}

-unmap_exit: - dmaengine_terminate_all(chan); - dma_unmap_sg(device->dev, data->sg, data->sg_len, conf.direction); - return -ENOMEM; +static void mmci_get_next_data(struct mmci_host *host, struct mmc_data *data) +{ + struct mmci_host_next *next = &host->next_data; + + if (data->host_cookie && data->host_cookie != next->cookie) { + printk(KERN_WARNING "[%s] invalid cookie: data->host_cookie %d" + " host->next_data.cookie %d\n", + __func__, data->host_cookie, host->next_data.cookie); + data->host_cookie = 0; + } + + if (!data->host_cookie) + return; + + host->dma_desc_current = next->dma_desc; + host->dma_current = next->dma_chan; + + next->dma_desc = NULL; + next->dma_chan = NULL; } + +static void mmci_pre_request(struct mmc_host *mmc, struct mmc_request *mrq, + bool is_first_req) +{ + struct mmci_host *host = mmc_priv(mmc); + struct mmc_data *data = mrq->data; + struct mmci_host_next *nd = &host->next_data; + + if (!data) + return; + + if (data->host_cookie) { + data->host_cookie = 0; + return; + } + + /* if config for dma */ + if (((data->flags & MMC_DATA_WRITE) && host->dma_tx_channel) || + ((data->flags & MMC_DATA_READ) && host->dma_rx_channel)) { + if (mmci_dma_prep_data(host, data, nd)) + data->host_cookie = 0; + else + data->host_cookie = ++nd->cookie < 0 ? 1 : nd->cookie; + } +} + +static void mmci_post_request(struct mmc_host *mmc, struct mmc_request *mrq, + int err) +{ + struct mmci_host *host = mmc_priv(mmc); + struct mmc_data *data = mrq->data; + struct dma_chan *chan; + enum dma_data_direction dir; + + if (!data) + return; + + if (data->flags & MMC_DATA_READ) { + dir = DMA_FROM_DEVICE; + chan = host->dma_rx_channel; + } else { + dir = DMA_TO_DEVICE; + chan = host->dma_tx_channel; + } + + + /* if config for dma */ + if (chan) { + if (err) + dmaengine_terminate_all(chan); + if (err || data->host_cookie) + dma_unmap_sg(mmc_dev(host->mmc), data->sg, + data->sg_len, dir); + mrq->data->host_cookie = 0; + } +} + #else /* Blank functions if the DMA engine is not available */ +static void mmci_get_next_data(struct mmci_host *host, struct mmc_data *data) +{ +} static inline void mmci_dma_setup(struct mmci_host *host) { } @@ -434,6 +543,10 @@ static inline int mmci_dma_start_data(struct mmci_host *host, unsigned int datac { return -ENOSYS; } + +#define mmci_pre_request NULL +#define mmci_post_request NULL + #endif

static void mmci_start_data(struct mmci_host *host, struct mmc_data *data) @@ -852,6 +965,9 @@ static void mmci_request(struct mmc_host *mmc, struct mmc_request *mrq)

host->mrq = mrq;

+ if (mrq->data) + mmci_get_next_data(host, mrq->data); + if (mrq->data && mrq->data->flags & MMC_DATA_READ) mmci_start_data(host, mrq->data);

@@ -966,6 +1082,8 @@ static irqreturn_t mmci_cd_irq(int irq, void *dev_id)

static const struct mmc_host_ops mmci_ops = { .request = mmci_request, + .pre_req = mmci_pre_request, + .post_req = mmci_post_request, .set_ios = mmci_set_ios, .get_ro = mmci_get_ro, .get_cd = mmci_get_cd, @@ -1003,6 +1121,8 @@ static int __devinit mmci_probe(struct amba_device *dev, host->gpio_cd = -ENOSYS; host->gpio_cd_irq = -1;

+ host->next_data.cookie = 1; + host->hw_designer = amba_manf(dev); host->hw_revision = amba_rev(dev); dev_dbg(mmc_dev(mmc), "designer ID = 0x%02x\n", host->hw_designer); diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h index ec9a7bc6..e21d850 100644 --- a/drivers/mmc/host/mmci.h +++ b/drivers/mmc/host/mmci.h @@ -150,6 +150,12 @@ struct clk; struct variant_data; struct dma_chan;

+struct mmci_host_next { + struct dma_async_tx_descriptor *dma_desc; + struct dma_chan *dma_chan; + s32 cookie; +}; + struct mmci_host { phys_addr_t phybase; void __iomem *base; @@ -187,6 +193,8 @@ struct mmci_host { struct dma_chan *dma_current; struct dma_chan *dma_rx_channel; struct dma_chan *dma_tx_channel; + struct dma_async_tx_descriptor *dma_desc_current; + struct mmci_host_next next_data;

#define dma_inprogress(host) ((host)->dma_current) #else

-- 1.7.4.1

Per Forlin

9:17 p.m.

New subject: [PATCH v6 04/11] mmc: mmc_test: add debugfs file to list all tests

Add a debugfs file "testlist" to print all available tests

Signed-off-by: Per Forlin per.forlin@linaro.org --- drivers/mmc/card/mmc_test.c | 39 ++++++++++++++++++++++++++++++++++++++- 1 files changed, 38 insertions(+), 1 deletions(-)

diff --git a/drivers/mmc/card/mmc_test.c b/drivers/mmc/card/mmc_test.c index 233cdfa..e8508e9 100644 --- a/drivers/mmc/card/mmc_test.c +++ b/drivers/mmc/card/mmc_test.c @@ -2445,6 +2445,32 @@ static const struct file_operations mmc_test_fops_test = { .release = single_release, };

+static int mtf_testlist_show(struct seq_file *sf, void *data) +{ + int i; + + mutex_lock(&mmc_test_lock); + + for (i = 0; i < ARRAY_SIZE(mmc_test_cases); i++) + seq_printf(sf, "%d:\t%s\n", i+1, mmc_test_cases[i].name); + + mutex_unlock(&mmc_test_lock); + + return 0; +} + +static int mtf_testlist_open(struct inode *inode, struct file *file) +{ + return single_open(file, mtf_testlist_show, inode->i_private); +} + +static const struct file_operations mmc_test_fops_testlist = { + .open = mtf_testlist_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + static void mmc_test_free_file_test(struct mmc_card *card) { struct mmc_test_dbgfs_file *df, *dfs; @@ -2476,7 +2502,18 @@ static int mmc_test_register_file_test(struct mmc_card *card)

if (IS_ERR_OR_NULL(file)) { dev_err(&card->dev, - "Can't create file. Perhaps debugfs is disabled.\n"); + "Can't create test. Perhaps debugfs is disabled.\n"); + ret = -ENODEV; + goto err; + } + + if (card->debugfs_root) + file = debugfs_create_file("testlist", S_IRUGO, + card->debugfs_root, card, &mmc_test_fops_testlist); + + if (IS_ERR_OR_NULL(file)) { + dev_err(&card->dev, + "Can't create testlist. Perhaps debugfs is disabled.\n"); ret = -ENODEV; goto err; }

-- 1.7.4.1

Per Forlin

9:17 p.m.

New subject: [PATCH v6 05/11] mmc: mmc_test: add test for non-blocking transfers

Add four tests for read and write performance per different transfer size, 4k to 4M. * Read using blocking mmc request * Read using non-blocking mmc request * Write using blocking mmc request * Write using non-blocking mmc request

The host dirver must support pre_req() and post_req() in order to run the non-blocking test cases.

Signed-off-by: Per Forlin per.forlin@linaro.org --- drivers/mmc/card/mmc_test.c | 322 +++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 313 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/card/mmc_test.c b/drivers/mmc/card/mmc_test.c index e8508e9..5325049 100644 --- a/drivers/mmc/card/mmc_test.c +++ b/drivers/mmc/card/mmc_test.c @@ -22,6 +22,7 @@ #include <linux/debugfs.h> #include <linux/uaccess.h> #include <linux/seq_file.h> +#include <linux/random.h>

#define RESULT_OK 0 #define RESULT_FAIL 1 @@ -51,10 +52,12 @@ struct mmc_test_pages { * struct mmc_test_mem - allocated memory. * @arr: array of allocations * @cnt: number of allocations + * @size_min_cmn: lowest common size in array of allocations */ struct mmc_test_mem { struct mmc_test_pages *arr; unsigned int cnt; + unsigned int size_min_cmn; };

/** @@ -148,6 +151,26 @@ struct mmc_test_card { struct mmc_test_general_result *gr; };

+enum mmc_test_prep_media { + MMC_TEST_PREP_NONE = 0, + MMC_TEST_PREP_WRITE_FULL = 1 << 0, + MMC_TEST_PREP_ERASE = 1 << 1, +}; + +struct mmc_test_multiple_rw { + unsigned int *bs; + unsigned int len; + unsigned int size; + bool do_write; + bool do_nonblock_req; + enum mmc_test_prep_media prepare; +}; + +struct mmc_test_async_req { + struct mmc_async_req areq; + struct mmc_test_card *test; +}; + /*******************************************************************/ /* General helper functions */ /*******************************************************************/ @@ -302,6 +325,7 @@ static struct mmc_test_mem *mmc_test_alloc_mem(unsigned long min_sz, unsigned long max_seg_page_cnt = DIV_ROUND_UP(max_seg_sz, PAGE_SIZE); unsigned long page_cnt = 0; unsigned long limit = nr_free_buffer_pages() >> 4; + unsigned int min_cmn = 0; struct mmc_test_mem *mem;

if (max_page_cnt > limit) @@ -345,6 +369,12 @@ static struct mmc_test_mem *mmc_test_alloc_mem(unsigned long min_sz, mem->arr[mem->cnt].page = page; mem->arr[mem->cnt].order = order; mem->cnt += 1; + if (!min_cmn) + min_cmn = PAGE_SIZE << order; + else + min_cmn = min(min_cmn, + (unsigned int) (PAGE_SIZE << order)); + if (max_page_cnt <= (1UL << order)) break; max_page_cnt -= 1UL << order; @@ -355,6 +385,7 @@ static struct mmc_test_mem *mmc_test_alloc_mem(unsigned long min_sz, break; } } + mem->size_min_cmn = min_cmn;

return mem;

@@ -381,7 +412,6 @@ static int mmc_test_map_sg(struct mmc_test_mem *mem, unsigned long sz, do { for (i = 0; i < mem->cnt; i++) { unsigned long len = PAGE_SIZE << mem->arr[i].order; - if (len > sz) len = sz; if (len > max_seg_sz) @@ -661,7 +691,7 @@ static void mmc_test_prepare_broken_mrq(struct mmc_test_card *test, * Checks that a normal transfer didn't have any errors */ static int mmc_test_check_result(struct mmc_test_card *test, - struct mmc_request *mrq) + struct mmc_request *mrq) { int ret;

@@ -685,6 +715,16 @@ static int mmc_test_check_result(struct mmc_test_card *test, return ret; }

+ +static int mmc_test_check_result_async(struct mmc_card *card, + struct mmc_async_req *areq) +{ + struct mmc_test_async_req *test_async = + container_of(areq, struct mmc_test_async_req, areq); + + return mmc_test_check_result(test_async->test, areq->mrq); +} + /* * Checks that a "short transfer" behaved as expected */ @@ -720,6 +760,89 @@ static int mmc_test_check_broken_result(struct mmc_test_card *test, }

/* + * Tests nonblock transfer with certain parameters + */ +static void mmc_test_nonblock_reset(struct mmc_request *mrq, + struct mmc_command *cmd, + struct mmc_command *stop, + struct mmc_data *data) +{ + memset(mrq, 0, sizeof(struct mmc_request)); + memset(cmd, 0, sizeof(struct mmc_command)); + memset(data, 0, sizeof(struct mmc_data)); + memset(stop, 0, sizeof(struct mmc_command)); + + mrq->cmd = cmd; + mrq->data = data; + mrq->stop = stop; +} +static int mmc_test_nonblock_transfer(struct mmc_test_card *test, + struct scatterlist *sg, unsigned sg_len, + unsigned dev_addr, unsigned blocks, + unsigned blksz, int write, int count) +{ + struct mmc_request mrq1; + struct mmc_command cmd1; + struct mmc_command stop1; + struct mmc_data data1; + + struct mmc_request mrq2; + struct mmc_command cmd2; + struct mmc_command stop2; + struct mmc_data data2; + + struct mmc_test_async_req test_areq[2]; + struct mmc_async_req *done_areq; + struct mmc_async_req *cur_areq = &test_areq[0].areq; + struct mmc_async_req *other_areq = &test_areq[1].areq; + int i; + int ret; + + test_areq[0].test = test; + test_areq[1].test = test; + + if (!test->card->host->ops->pre_req || + !test->card->host->ops->post_req) + return -RESULT_UNSUP_HOST; + + mmc_test_nonblock_reset(&mrq1, &cmd1, &stop1, &data1); + mmc_test_nonblock_reset(&mrq2, &cmd2, &stop2, &data2); + + cur_areq->mrq = &mrq1; + cur_areq->err_check = mmc_test_check_result_async; + other_areq->mrq = &mrq2; + other_areq->err_check = mmc_test_check_result_async; + + for (i = 0; i < count; i++) { + mmc_test_prepare_mrq(test, cur_areq->mrq, sg, sg_len, dev_addr, + blocks, blksz, write); + done_areq = mmc_start_req(test->card->host, cur_areq, &ret); + + if (ret || (!done_areq && i > 0)) + goto err; + + if (done_areq) { + if (done_areq->mrq == &mrq2) + mmc_test_nonblock_reset(&mrq2, &cmd2, + &stop2, &data2); + else + mmc_test_nonblock_reset(&mrq1, &cmd1, + &stop1, &data1); + } + done_areq = cur_areq; + cur_areq = other_areq; + other_areq = done_areq; + dev_addr += blocks; + } + + done_areq = mmc_start_req(test->card->host, NULL, &ret); + + return ret; +err: + return ret; +} + +/* * Tests a basic transfer with certain parameters */ static int mmc_test_simple_transfer(struct mmc_test_card *test, @@ -1336,14 +1459,17 @@ static int mmc_test_area_transfer(struct mmc_test_card *test, }

/* - * Map and transfer bytes. + * Map and transfer bytes for multiple transfers. */ -static int mmc_test_area_io(struct mmc_test_card *test, unsigned long sz, - unsigned int dev_addr, int write, int max_scatter, - int timed) +static int mmc_test_area_io_seq(struct mmc_test_card *test, unsigned long sz, + unsigned int dev_addr, int write, + int max_scatter, int timed, int count, + bool nonblock) { struct timespec ts1, ts2; - int ret; + int ret = 0; + int i; + struct mmc_test_area *t = &test->area;

/* * In the case of a maximally scattered transfer, the maximum transfer @@ -1367,8 +1493,15 @@ static int mmc_test_area_io(struct mmc_test_card *test, unsigned long sz,

if (timed) getnstimeofday(&ts1); + if (nonblock) + ret = mmc_test_nonblock_transfer(test, t->sg, t->sg_len, + dev_addr, t->blocks, 512, write, count); + else + for (i = 0; i < count && ret == 0; i++) { + ret = mmc_test_area_transfer(test, dev_addr, write); + dev_addr += sz >> 9; + }

- ret = mmc_test_area_transfer(test, dev_addr, write); if (ret) return ret;

@@ -1376,11 +1509,19 @@ static int mmc_test_area_io(struct mmc_test_card *test, unsigned long sz, getnstimeofday(&ts2);

if (timed) - mmc_test_print_rate(test, sz, &ts1, &ts2); + mmc_test_print_avg_rate(test, sz, count, &ts1, &ts2);

return 0; }

+static int mmc_test_area_io(struct mmc_test_card *test, unsigned long sz, + unsigned int dev_addr, int write, int max_scatter, + int timed) +{ + return mmc_test_area_io_seq(test, sz, dev_addr, write, max_scatter, + timed, 1, false); +} + /* * Write the test area entirely. */ @@ -1954,6 +2095,142 @@ static int mmc_test_large_seq_write_perf(struct mmc_test_card *test) return mmc_test_large_seq_perf(test, 1); }

+static int mmc_test_rw_multiple(struct mmc_test_card *test, + struct mmc_test_multiple_rw *tdata, + unsigned int reqsize, unsigned int size) +{ + unsigned int dev_addr; + struct mmc_test_area *t = &test->area; + int ret = 0; + + /* Set up test area */ + if (size > mmc_test_capacity(test->card) / 2 * 512) + size = mmc_test_capacity(test->card) / 2 * 512; + if (reqsize > t->max_tfr) + reqsize = t->max_tfr; + dev_addr = mmc_test_capacity(test->card) / 4; + if ((dev_addr & 0xffff0000)) + dev_addr &= 0xffff0000; /* Round to 64MiB boundary */ + else + dev_addr &= 0xfffff800; /* Round to 1MiB boundary */ + if (!dev_addr) + goto err; + + /* prepare test area */ + if (mmc_can_erase(test->card) && + tdata->prepare & MMC_TEST_PREP_ERASE) { + ret = mmc_erase(test->card, dev_addr, + size / 512, MMC_SECURE_ERASE_ARG); + if (ret) + ret = mmc_erase(test->card, dev_addr, + size / 512, MMC_ERASE_ARG); + if (ret) + goto err; + } + + /* Run test */ + ret = mmc_test_area_io_seq(test, reqsize, dev_addr, + tdata->do_write, 0, 1, size / reqsize, + tdata->do_nonblock_req); + if (ret) + goto err; + + return ret; + err: + printk(KERN_INFO "[%s] error\n", __func__); + return ret; +} + +static int mmc_test_rw_multiple_size(struct mmc_test_card *test, + struct mmc_test_multiple_rw *rw) +{ + int ret = 0; + int i; + + for (i = 0 ; i < rw->len && ret == 0; i++) { + ret = mmc_test_rw_multiple(test, rw, rw->bs[i], rw->size); + if (ret) + break; + } + return ret; +} + +/* + * Multiple blocking write 4k to 4 MB chunks + */ +static int mmc_test_profile_mult_write_blocking_perf(struct mmc_test_card *test) +{ + unsigned int bs[] = {1 << 12, 1 << 13, 1 << 14, 1 << 15, 1 << 16, + 1 << 17, 1 << 18, 1 << 19, 1 << 20, 1 << 22}; + struct mmc_test_multiple_rw test_data = { + .bs = bs, + .size = 128*1024*1024, + .len = ARRAY_SIZE(bs), + .do_write = true, + .do_nonblock_req = false, + .prepare = MMC_TEST_PREP_ERASE, + }; + + return mmc_test_rw_multiple_size(test, &test_data); +}; + +/* + * Multiple non-blocking write 4k to 4 MB chunks + */ +static int mmc_test_profile_mult_write_nonblock_perf(struct mmc_test_card *test) +{ + unsigned int bs[] = {1 << 12, 1 << 13, 1 << 14, 1 << 15, 1 << 16, + 1 << 17, 1 << 18, 1 << 19, 1 << 20, 1 << 22}; + struct mmc_test_multiple_rw test_data = { + .bs = bs, + .size = 128*1024*1024, + .len = ARRAY_SIZE(bs), + .do_write = true, + .do_nonblock_req = true, + .prepare = MMC_TEST_PREP_ERASE, + }; + + return mmc_test_rw_multiple_size(test, &test_data); +} + +/* + * Multiple blocking read 4k to 4 MB chunks + */ +static int mmc_test_profile_mult_read_blocking_perf(struct mmc_test_card *test) +{ + unsigned int bs[] = {1 << 12, 1 << 13, 1 << 14, 1 << 15, 1 << 16, + 1 << 17, 1 << 18, 1 << 19, 1 << 20, 1 << 22}; + struct mmc_test_multiple_rw test_data = { + .bs = bs, + .size = 128*1024*1024, + .len = ARRAY_SIZE(bs), + .do_write = false, + .do_nonblock_req = false, + .prepare = MMC_TEST_PREP_NONE, + }; + + return mmc_test_rw_multiple_size(test, &test_data); +} + +/* + * Multiple non-blocking read 4k to 4 MB chunks + */ +static int mmc_test_profile_mult_read_nonblock_perf(struct mmc_test_card *test) +{ + unsigned int bs[] = {1 << 12, 1 << 13, 1 << 14, 1 << 15, 1 << 16, + 1 << 17, 1 << 18, 1 << 19, 1 << 20, 1 << 22}; + struct mmc_test_multiple_rw test_data = { + .bs = bs, + .size = 128*1024*1024, + .len = ARRAY_SIZE(bs), + .do_write = false, + .do_nonblock_req = true, + .prepare = MMC_TEST_PREP_NONE, + }; + + return mmc_test_rw_multiple_size(test, &test_data); +} + static const struct mmc_test_case mmc_test_cases[] = { { .name = "Basic write (no data verification)", @@ -2221,6 +2498,33 @@ static const struct mmc_test_case mmc_test_cases[] = { .cleanup = mmc_test_area_cleanup, },

+ { + .name = "Write performance with blocking req 4k to 4MB", + .prepare = mmc_test_area_prepare, + .run = mmc_test_profile_mult_write_blocking_perf, + .cleanup = mmc_test_area_cleanup, + }, + + { + .name = "Write performance with non-blocking req 4k to 4MB", + .prepare = mmc_test_area_prepare, + .run = mmc_test_profile_mult_write_nonblock_perf, + .cleanup = mmc_test_area_cleanup, + }, + + { + .name = "Read performance with blocking req 4k to 4MB", + .prepare = mmc_test_area_prepare, + .run = mmc_test_profile_mult_read_blocking_perf, + .cleanup = mmc_test_area_cleanup, + }, + + { + .name = "Read performance with non-blocking req 4k to 4MB", + .prepare = mmc_test_area_prepare, + .run = mmc_test_profile_mult_read_nonblock_perf, + .cleanup = mmc_test_area_cleanup, + }, };

static DEFINE_MUTEX(mmc_test_lock);

-- 1.7.4.1

Per Forlin

9:17 p.m.

New subject: [PATCH v6 06/11] mmc: add member in mmc queue struct to hold request data

The way the request data is organized in the mmc queue struct it only allows processing of one request at the time. This patch adds a new struct to hold mmc queue request data such as sg list, request, blk request and bounce buffers, and updates any functions depending on the mmc queue struct. This lies the ground for using multiple active request for one mmc queue.

Signed-off-by: Per Forlin per.forlin@linaro.org --- drivers/mmc/card/block.c | 125 +++++++++++++++++++++----------------------- drivers/mmc/card/queue.c | 129 ++++++++++++++++++++++++---------------------- drivers/mmc/card/queue.h | 31 ++++++++--- 3 files changed, 149 insertions(+), 136 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 71da564..3d11690 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -427,14 +427,6 @@ static const struct block_device_operations mmc_bdops = { #endif };

-struct mmc_blk_request { - struct mmc_request mrq; - struct mmc_command sbc; - struct mmc_command cmd; - struct mmc_command stop; - struct mmc_data data; -}; - static inline int mmc_blk_part_switch(struct mmc_card *card, struct mmc_blk_data *md) { @@ -673,7 +665,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) { struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card; - struct mmc_blk_request brq; + struct mmc_blk_request *brq = &mq->mqrq_cur->brq; int ret = 1, disable_multi = 0;

/* @@ -689,56 +681,56 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) struct mmc_command cmd = {0}; u32 readcmd, writecmd, status = 0;

- memset(&brq, 0, sizeof(struct mmc_blk_request)); - brq.mrq.cmd = &brq.cmd; - brq.mrq.data = &brq.data; + memset(brq, 0, sizeof(struct mmc_blk_request)); + brq->mrq.cmd = &brq->cmd; + brq->mrq.data = &brq->data;

- brq.cmd.arg = blk_rq_pos(req); + brq->cmd.arg = blk_rq_pos(req); if (!mmc_card_blockaddr(card)) - brq.cmd.arg <<= 9; - brq.cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC; - brq.data.blksz = 512; - brq.stop.opcode = MMC_STOP_TRANSMISSION; - brq.stop.arg = 0; - brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC; - brq.data.blocks = blk_rq_sectors(req); + brq->cmd.arg <<= 9; + brq->cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC; + brq->data.blksz = 512; + brq->stop.opcode = MMC_STOP_TRANSMISSION; + brq->stop.arg = 0; + brq->stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC; + brq->data.blocks = blk_rq_sectors(req);

/* * The block layer doesn't support all sector count * restrictions, so we need to be prepared for too big * requests. */ - if (brq.data.blocks > card->host->max_blk_count) - brq.data.blocks = card->host->max_blk_count; + if (brq->data.blocks > card->host->max_blk_count) + brq->data.blocks = card->host->max_blk_count;

/* * After a read error, we redo the request one sector at a time * in order to accurately determine which sectors can be read * successfully. */ - if (disable_multi && brq.data.blocks > 1) - brq.data.blocks = 1; + if (disable_multi && brq->data.blocks > 1) + brq->data.blocks = 1;

- if (brq.data.blocks > 1 || do_rel_wr) { + if (brq->data.blocks > 1 || do_rel_wr) { /* SPI multiblock writes terminate using a special * token, not a STOP_TRANSMISSION request. */ if (!mmc_host_is_spi(card->host) || rq_data_dir(req) == READ) - brq.mrq.stop = &brq.stop; + brq->mrq.stop = &brq->stop; readcmd = MMC_READ_MULTIPLE_BLOCK; writecmd = MMC_WRITE_MULTIPLE_BLOCK; } else { - brq.mrq.stop = NULL; + brq->mrq.stop = NULL; readcmd = MMC_READ_SINGLE_BLOCK; writecmd = MMC_WRITE_BLOCK; } if (rq_data_dir(req) == READ) { - brq.cmd.opcode = readcmd; - brq.data.flags |= MMC_DATA_READ; + brq->cmd.opcode = readcmd; + brq->data.flags |= MMC_DATA_READ; } else { - brq.cmd.opcode = writecmd; - brq.data.flags |= MMC_DATA_WRITE; + brq->cmd.opcode = writecmd; + brq->data.flags |= MMC_DATA_WRITE; }

if (do_rel_wr) @@ -764,29 +756,29 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) */

if ((md->flags & MMC_BLK_CMD23) && - mmc_op_multi(brq.cmd.opcode) && + mmc_op_multi(brq->cmd.opcode) && (do_rel_wr || !(card->quirks & MMC_QUIRK_BLK_NO_CMD23))) { - brq.sbc.opcode = MMC_SET_BLOCK_COUNT; - brq.sbc.arg = brq.data.blocks | + brq->sbc.opcode = MMC_SET_BLOCK_COUNT; + brq->sbc.arg = brq->data.blocks | (do_rel_wr ? (1 << 31) : 0); - brq.sbc.flags = MMC_RSP_R1 | MMC_CMD_AC; - brq.mrq.sbc = &brq.sbc; + brq->sbc.flags = MMC_RSP_R1 | MMC_CMD_AC; + brq->mrq.sbc = &brq->sbc; }

- mmc_set_data_timeout(&brq.data, card); + mmc_set_data_timeout(&brq->data, card);

- brq.data.sg = mq->sg; - brq.data.sg_len = mmc_queue_map_sg(mq); + brq->data.sg = mq->mqrq_cur->sg; + brq->data.sg_len = mmc_queue_map_sg(mq, mq->mqrq_cur);

/* * Adjust the sg list so it is the same size as the * request. */ - if (brq.data.blocks != blk_rq_sectors(req)) { - int i, data_size = brq.data.blocks << 9; + if (brq->data.blocks != blk_rq_sectors(req)) { + int i, data_size = brq->data.blocks << 9; struct scatterlist *sg;

- for_each_sg(brq.data.sg, sg, brq.data.sg_len, i) { + for_each_sg(brq->data.sg, sg, brq->data.sg_len, i) { data_size -= sg->length; if (data_size <= 0) { sg->length += data_size; @@ -794,23 +786,23 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) break; } } - brq.data.sg_len = i; + brq->data.sg_len = i; }

- mmc_queue_bounce_pre(mq); + mmc_queue_bounce_pre(mq->mqrq_cur);

- mmc_wait_for_req(card->host, &brq.mrq); + mmc_wait_for_req(card->host, &brq->mrq);

- mmc_queue_bounce_post(mq); + mmc_queue_bounce_post(mq->mqrq_cur);

/* * Check for errors here, but don't jump to cmd_err * until later as we need to wait for the card to leave * programming mode even when things go wrong. */ - if (brq.sbc.error || brq.cmd.error || - brq.data.error || brq.stop.error) { - if (brq.data.blocks > 1 && rq_data_dir(req) == READ) { + if (brq->sbc.error || brq->cmd.error || + brq->data.error || brq->stop.error) { + if (brq->data.blocks > 1 && rq_data_dir(req) == READ) { /* Redo read one sector at a time */ printk(KERN_WARNING "%s: retrying using single " "block read\n", req->rq_disk->disk_name); @@ -820,36 +812,36 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) status = get_card_status(card, req); }

- if (brq.sbc.error) { + if (brq->sbc.error) { printk(KERN_ERR "%s: error %d sending SET_BLOCK_COUNT " "command, response %#x, card status %#x\n", - req->rq_disk->disk_name, brq.sbc.error, - brq.sbc.resp[0], status); + req->rq_disk->disk_name, brq->sbc.error, + brq->sbc.resp[0], status); }

- if (brq.cmd.error) { + if (brq->cmd.error) { printk(KERN_ERR "%s: error %d sending read/write " "command, response %#x, card status %#x\n", - req->rq_disk->disk_name, brq.cmd.error, - brq.cmd.resp[0], status); + req->rq_disk->disk_name, brq->cmd.error, + brq->cmd.resp[0], status); }

- if (brq.data.error) { - if (brq.data.error == -ETIMEDOUT && brq.mrq.stop) + if (brq->data.error) { + if (brq->data.error == -ETIMEDOUT && brq->mrq.stop) /* 'Stop' response contains card status */ - status = brq.mrq.stop->resp[0]; + status = brq->mrq.stop->resp[0]; printk(KERN_ERR "%s: error %d transferring data," " sector %u, nr %u, card status %#x\n", - req->rq_disk->disk_name, brq.data.error, + req->rq_disk->disk_name, brq->data.error, (unsigned)blk_rq_pos(req), (unsigned)blk_rq_sectors(req), status); }

- if (brq.stop.error) { + if (brq->stop.error) { printk(KERN_ERR "%s: error %d sending stop command, " "response %#x, card status %#x\n", - req->rq_disk->disk_name, brq.stop.error, - brq.stop.resp[0], status); + req->rq_disk->disk_name, brq->stop.error, + brq->stop.resp[0], status); }

if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) { @@ -882,7 +874,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) #endif }

- if (brq.cmd.error || brq.stop.error || brq.data.error) { + if (brq->cmd.error || brq->stop.error || brq->data.error) { if (rq_data_dir(req) == READ) { /* * After an error, we redo I/O one sector at a @@ -890,7 +882,8 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) * read a single sector. */ spin_lock_irq(&md->lock); - ret = __blk_end_request(req, -EIO, brq.data.blksz); + ret = __blk_end_request(req, -EIO, + brq->data.blksz); spin_unlock_irq(&md->lock); continue; } @@ -901,7 +894,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) * A block was successfully transferred. */ spin_lock_irq(&md->lock); - ret = __blk_end_request(req, 0, brq.data.bytes_xfered); + ret = __blk_end_request(req, 0, brq->data.bytes_xfered); spin_unlock_irq(&md->lock); } while (ret);

@@ -927,7 +920,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) } } else { spin_lock_irq(&md->lock); - ret = __blk_end_request(req, 0, brq.data.bytes_xfered); + ret = __blk_end_request(req, 0, brq->data.bytes_xfered); spin_unlock_irq(&md->lock); }

diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c index c07322c..81d0eef 100644 --- a/drivers/mmc/card/queue.c +++ b/drivers/mmc/card/queue.c @@ -56,7 +56,7 @@ static int mmc_queue_thread(void *d) spin_lock_irq(q->queue_lock); set_current_state(TASK_INTERRUPTIBLE); req = blk_fetch_request(q); - mq->req = req; + mq->mqrq_cur->req = req; spin_unlock_irq(q->queue_lock);

if (!req) { @@ -97,10 +97,25 @@ static void mmc_request(struct request_queue *q) return; }

- if (!mq->req) + if (!mq->mqrq_cur->req) wake_up_process(mq->thread); }

+struct scatterlist *mmc_alloc_sg(int sg_len, int *err) +{ + struct scatterlist *sg; + + sg = kmalloc(sizeof(struct scatterlist)*sg_len, GFP_KERNEL); + if (!sg) + *err = -ENOMEM; + else { + *err = 0; + sg_init_table(sg, sg_len); + } + + return sg; +} + /** * mmc_init_queue - initialise a queue structure. * @mq: mmc queue @@ -114,6 +129,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock struct mmc_host *host = card->host; u64 limit = BLK_BOUNCE_HIGH; int ret; + struct mmc_queue_req *mqrq_cur = &mq->mqrq[0];

if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask) limit = *mmc_dev(host)->dma_mask; @@ -123,8 +139,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock if (!mq->queue) return -ENOMEM;

+ memset(&mq->mqrq_cur, 0, sizeof(mq->mqrq_cur)); + mq->mqrq_cur = mqrq_cur; mq->queue->queuedata = mq; - mq->req = NULL;

blk_queue_prep_rq(mq->queue, mmc_prep_request); queue_flag_set_unlocked(QUEUE_FLAG_NONROT, mq->queue); @@ -158,53 +175,44 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock bouncesz = host->max_blk_count * 512;

if (bouncesz > 512) { - mq->bounce_buf = kmalloc(bouncesz, GFP_KERNEL); - if (!mq->bounce_buf) { + mqrq_cur->bounce_buf = kmalloc(bouncesz, GFP_KERNEL); + if (!mqrq_cur->bounce_buf) { printk(KERN_WARNING "%s: unable to " - "allocate bounce buffer\n", + "allocate bounce cur buffer\n", mmc_card_name(card)); } }

- if (mq->bounce_buf) { + if (mqrq_cur->bounce_buf) { blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_ANY); blk_queue_max_hw_sectors(mq->queue, bouncesz / 512); blk_queue_max_segments(mq->queue, bouncesz / 512); blk_queue_max_segment_size(mq->queue, bouncesz);

- mq->sg = kmalloc(sizeof(struct scatterlist), - GFP_KERNEL); - if (!mq->sg) { - ret = -ENOMEM; + mqrq_cur->sg = mmc_alloc_sg(1, &ret); + if (ret) goto cleanup_queue; - } - sg_init_table(mq->sg, 1);

- mq->bounce_sg = kmalloc(sizeof(struct scatterlist) * - bouncesz / 512, GFP_KERNEL); - if (!mq->bounce_sg) { - ret = -ENOMEM; + mqrq_cur->bounce_sg = + mmc_alloc_sg(bouncesz / 512, &ret); + if (ret) goto cleanup_queue; - } - sg_init_table(mq->bounce_sg, bouncesz / 512); + } } #endif

- if (!mq->bounce_buf) { + if (!mqrq_cur->bounce_buf) { blk_queue_bounce_limit(mq->queue, limit); blk_queue_max_hw_sectors(mq->queue, min(host->max_blk_count, host->max_req_size / 512)); blk_queue_max_segments(mq->queue, host->max_segs); blk_queue_max_segment_size(mq->queue, host->max_seg_size);

- mq->sg = kmalloc(sizeof(struct scatterlist) * - host->max_segs, GFP_KERNEL); - if (!mq->sg) { - ret = -ENOMEM; + mqrq_cur->sg = mmc_alloc_sg(host->max_segs, &ret); + if (ret) goto cleanup_queue; - } - sg_init_table(mq->sg, host->max_segs); + }

sema_init(&mq->thread_sem, 1); @@ -219,16 +227,15 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock

return 0; free_bounce_sg: - if (mq->bounce_sg) - kfree(mq->bounce_sg); - mq->bounce_sg = NULL; + kfree(mqrq_cur->bounce_sg); + mqrq_cur->bounce_sg = NULL; + cleanup_queue: - if (mq->sg) - kfree(mq->sg); - mq->sg = NULL; - if (mq->bounce_buf) - kfree(mq->bounce_buf); - mq->bounce_buf = NULL; + kfree(mqrq_cur->sg); + mqrq_cur->sg = NULL; + kfree(mqrq_cur->bounce_buf); + mqrq_cur->bounce_buf = NULL; + blk_cleanup_queue(mq->queue); return ret; } @@ -237,6 +244,7 @@ void mmc_cleanup_queue(struct mmc_queue *mq) { struct request_queue *q = mq->queue; unsigned long flags; + struct mmc_queue_req *mqrq_cur = mq->mqrq_cur;

/* Make sure the queue isn't suspended, as that will deadlock */ mmc_queue_resume(mq); @@ -250,16 +258,14 @@ void mmc_cleanup_queue(struct mmc_queue *mq) blk_start_queue(q); spin_unlock_irqrestore(q->queue_lock, flags);

- if (mq->bounce_sg) - kfree(mq->bounce_sg); - mq->bounce_sg = NULL; + kfree(mqrq_cur->bounce_sg); + mqrq_cur->bounce_sg = NULL;

- kfree(mq->sg); - mq->sg = NULL; + kfree(mqrq_cur->sg); + mqrq_cur->sg = NULL;

- if (mq->bounce_buf) - kfree(mq->bounce_buf); - mq->bounce_buf = NULL; + kfree(mqrq_cur->bounce_buf); + mqrq_cur->bounce_buf = NULL;

mq->card = NULL; } @@ -312,27 +318,27 @@ void mmc_queue_resume(struct mmc_queue *mq) /* * Prepare the sg list(s) to be handed of to the host driver */ -unsigned int mmc_queue_map_sg(struct mmc_queue *mq) +unsigned int mmc_queue_map_sg(struct mmc_queue *mq, struct mmc_queue_req *mqrq) { unsigned int sg_len; size_t buflen; struct scatterlist *sg; int i;

- if (!mq->bounce_buf) - return blk_rq_map_sg(mq->queue, mq->req, mq->sg); + if (!mqrq->bounce_buf) + return blk_rq_map_sg(mq->queue, mqrq->req, mqrq->sg);

- BUG_ON(!mq->bounce_sg); + BUG_ON(!mqrq->bounce_sg);

- sg_len = blk_rq_map_sg(mq->queue, mq->req, mq->bounce_sg); + sg_len = blk_rq_map_sg(mq->queue, mqrq->req, mqrq->bounce_sg);

- mq->bounce_sg_len = sg_len; + mqrq->bounce_sg_len = sg_len;

buflen = 0; - for_each_sg(mq->bounce_sg, sg, sg_len, i) + for_each_sg(mqrq->bounce_sg, sg, sg_len, i) buflen += sg->length;

- sg_init_one(mq->sg, mq->bounce_buf, buflen); + sg_init_one(mqrq->sg, mqrq->bounce_buf, buflen);

return 1; } @@ -341,31 +347,30 @@ unsigned int mmc_queue_map_sg(struct mmc_queue *mq) * If writing, bounce the data to the buffer before the request * is sent to the host driver */ -void mmc_queue_bounce_pre(struct mmc_queue *mq) +void mmc_queue_bounce_pre(struct mmc_queue_req *mqrq) { - if (!mq->bounce_buf) + if (!mqrq->bounce_buf) return;

- if (rq_data_dir(mq->req) != WRITE) + if (rq_data_dir(mqrq->req) != WRITE) return;

- sg_copy_to_buffer(mq->bounce_sg, mq->bounce_sg_len, - mq->bounce_buf, mq->sg[0].length); + sg_copy_to_buffer(mqrq->bounce_sg, mqrq->bounce_sg_len, + mqrq->bounce_buf, mqrq->sg[0].length); }

/* * If reading, bounce the data from the buffer after the request * has been handled by the host driver */ -void mmc_queue_bounce_post(struct mmc_queue *mq) +void mmc_queue_bounce_post(struct mmc_queue_req *mqrq) { - if (!mq->bounce_buf) + if (!mqrq->bounce_buf) return;

- if (rq_data_dir(mq->req) != READ) + if (rq_data_dir(mqrq->req) != READ) return;

- sg_copy_from_buffer(mq->bounce_sg, mq->bounce_sg_len, - mq->bounce_buf, mq->sg[0].length); + sg_copy_from_buffer(mqrq->bounce_sg, mqrq->bounce_sg_len, + mqrq->bounce_buf, mqrq->sg[0].length); } - diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h index 64e66e0..a1defed 100644 --- a/drivers/mmc/card/queue.h +++ b/drivers/mmc/card/queue.h @@ -4,19 +4,33 @@ struct request; struct task_struct;

+struct mmc_blk_request { + struct mmc_request mrq; + struct mmc_command sbc; + struct mmc_command cmd; + struct mmc_command stop; + struct mmc_data data; +}; + +struct mmc_queue_req { + struct request *req; + struct mmc_blk_request brq; + struct scatterlist *sg; + char *bounce_buf; + struct scatterlist *bounce_sg; + unsigned int bounce_sg_len; +}; + struct mmc_queue { struct mmc_card *card; struct task_struct *thread; struct semaphore thread_sem; unsigned int flags; - struct request *req; int (*issue_fn)(struct mmc_queue *, struct request *); void *data; struct request_queue *queue; - struct scatterlist *sg; - char *bounce_buf; - struct scatterlist *bounce_sg; - unsigned int bounce_sg_len; + struct mmc_queue_req mqrq[1]; + struct mmc_queue_req *mqrq_cur; };

extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *); @@ -24,8 +38,9 @@ extern void mmc_cleanup_queue(struct mmc_queue *); extern void mmc_queue_suspend(struct mmc_queue *); extern void mmc_queue_resume(struct mmc_queue *);

-extern unsigned int mmc_queue_map_sg(struct mmc_queue *); -extern void mmc_queue_bounce_pre(struct mmc_queue *); -extern void mmc_queue_bounce_post(struct mmc_queue *); +extern unsigned int mmc_queue_map_sg(struct mmc_queue *, + struct mmc_queue_req *); +extern void mmc_queue_bounce_pre(struct mmc_queue_req *); +extern void mmc_queue_bounce_post(struct mmc_queue_req *);

#endif

-- 1.7.4.1

Per Forlin

9:17 p.m.

New subject: [PATCH v6 07/11] mmc: add a block request prepare function

Break out code from mmc_blk_issue_rw_rq to create a block request prepare function. This doesn't change any functionallity. This helps when handling more than one active block request.

Signed-off-by: Per Forlin per.forlin@linaro.org --- drivers/mmc/card/block.c | 224 ++++++++++++++++++++++++--------------------- 1 files changed, 119 insertions(+), 105 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 3d11690..144d435 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -661,12 +661,15 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq, } }

-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) +static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, + struct mmc_card *card, + int disable_multi, + struct mmc_queue *mq) { + u32 readcmd, writecmd; + struct mmc_blk_request *brq = &mqrq->brq; + struct request *req = mqrq->req; struct mmc_blk_data *md = mq->data; - struct mmc_card *card = md->queue.card; - struct mmc_blk_request *brq = &mq->mqrq_cur->brq; - int ret = 1, disable_multi = 0;

/* * Reliable writes are used to implement Forced Unit Access and @@ -677,120 +680,131 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) (rq_data_dir(req) == WRITE) && (md->flags & MMC_BLK_REL_WR);

- do { - struct mmc_command cmd = {0}; - u32 readcmd, writecmd, status = 0; - - memset(brq, 0, sizeof(struct mmc_blk_request)); - brq->mrq.cmd = &brq->cmd; - brq->mrq.data = &brq->data; - - brq->cmd.arg = blk_rq_pos(req); - if (!mmc_card_blockaddr(card)) - brq->cmd.arg <<= 9; - brq->cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC; - brq->data.blksz = 512; - brq->stop.opcode = MMC_STOP_TRANSMISSION; - brq->stop.arg = 0; - brq->stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC; - brq->data.blocks = blk_rq_sectors(req); - - /* - * The block layer doesn't support all sector count - * restrictions, so we need to be prepared for too big - * requests. - */ - if (brq->data.blocks > card->host->max_blk_count) - brq->data.blocks = card->host->max_blk_count; + memset(brq, 0, sizeof(struct mmc_blk_request)); + brq->mrq.cmd = &brq->cmd; + brq->mrq.data = &brq->data;

- /* - * After a read error, we redo the request one sector at a time - * in order to accurately determine which sectors can be read - * successfully. - */ - if (disable_multi && brq->data.blocks > 1) - brq->data.blocks = 1; + brq->cmd.arg = blk_rq_pos(req); + if (!mmc_card_blockaddr(card)) + brq->cmd.arg <<= 9; + brq->cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC; + brq->data.blksz = 512; + brq->stop.opcode = MMC_STOP_TRANSMISSION; + brq->stop.arg = 0; + brq->stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC; + brq->data.blocks = blk_rq_sectors(req);

- if (brq->data.blocks > 1 || do_rel_wr) { - /* SPI multiblock writes terminate using a special - * token, not a STOP_TRANSMISSION request. - */ - if (!mmc_host_is_spi(card->host) || - rq_data_dir(req) == READ) - brq->mrq.stop = &brq->stop; - readcmd = MMC_READ_MULTIPLE_BLOCK; - writecmd = MMC_WRITE_MULTIPLE_BLOCK; - } else { - brq->mrq.stop = NULL; - readcmd = MMC_READ_SINGLE_BLOCK; - writecmd = MMC_WRITE_BLOCK; - } - if (rq_data_dir(req) == READ) { - brq->cmd.opcode = readcmd; - brq->data.flags |= MMC_DATA_READ; - } else { - brq->cmd.opcode = writecmd; - brq->data.flags |= MMC_DATA_WRITE; - } + /* + * The block layer doesn't support all sector count + * restrictions, so we need to be prepared for too big + * requests. + */ + if (brq->data.blocks > card->host->max_blk_count) + brq->data.blocks = card->host->max_blk_count;

- if (do_rel_wr) - mmc_apply_rel_rw(&brq, card, req); + /* + * After a read error, we redo the request one sector at a time + * in order to accurately determine which sectors can be read + * successfully. + */ + if (disable_multi && brq->data.blocks > 1) + brq->data.blocks = 1;

- /* - * Pre-defined multi-block transfers are preferable to - * open ended-ones (and necessary for reliable writes). - * However, it is not sufficient to just send CMD23, - * and avoid the final CMD12, as on an error condition - * CMD12 (stop) needs to be sent anyway. This, coupled - * with Auto-CMD23 enhancements provided by some - * hosts, means that the complexity of dealing - * with this is best left to the host. If CMD23 is - * supported by card and host, we'll fill sbc in and let - * the host deal with handling it correctly. This means - * that for hosts that don't expose MMC_CAP_CMD23, no - * change of behavior will be observed. - * - * N.B: Some MMC cards experience perf degradation. - * We'll avoid using CMD23-bounded multiblock writes for - * these, while retaining features like reliable writes. + if (brq->data.blocks > 1 || do_rel_wr) { + /* SPI multiblock writes terminate using a special + * token, not a STOP_TRANSMISSION request. */ + if (!mmc_host_is_spi(card->host) || + rq_data_dir(req) == READ) + brq->mrq.stop = &brq->stop; + readcmd = MMC_READ_MULTIPLE_BLOCK; + writecmd = MMC_WRITE_MULTIPLE_BLOCK; + } else { + brq->mrq.stop = NULL; + readcmd = MMC_READ_SINGLE_BLOCK; + writecmd = MMC_WRITE_BLOCK; + } + if (rq_data_dir(req) == READ) { + brq->cmd.opcode = readcmd; + brq->data.flags |= MMC_DATA_READ; + } else { + brq->cmd.opcode = writecmd; + brq->data.flags |= MMC_DATA_WRITE; + }

- if ((md->flags & MMC_BLK_CMD23) && - mmc_op_multi(brq->cmd.opcode) && - (do_rel_wr || !(card->quirks & MMC_QUIRK_BLK_NO_CMD23))) { - brq->sbc.opcode = MMC_SET_BLOCK_COUNT; - brq->sbc.arg = brq->data.blocks | - (do_rel_wr ? (1 << 31) : 0); - brq->sbc.flags = MMC_RSP_R1 | MMC_CMD_AC; - brq->mrq.sbc = &brq->sbc; - } + if (do_rel_wr) + mmc_apply_rel_rw(brq, card, req);

- mmc_set_data_timeout(&brq->data, card); + /* + * Pre-defined multi-block transfers are preferable to + * open ended-ones (and necessary for reliable writes). + * However, it is not sufficient to just send CMD23, + * and avoid the final CMD12, as on an error condition + * CMD12 (stop) needs to be sent anyway. This, coupled + * with Auto-CMD23 enhancements provided by some + * hosts, means that the complexity of dealing + * with this is best left to the host. If CMD23 is + * supported by card and host, we'll fill sbc in and let + * the host deal with handling it correctly. This means + * that for hosts that don't expose MMC_CAP_CMD23, no + * change of behavior will be observed. + * + * N.B: Some MMC cards experience perf degradation. + * We'll avoid using CMD23-bounded multiblock writes for + * these, while retaining features like reliable writes. + */

- brq->data.sg = mq->mqrq_cur->sg; - brq->data.sg_len = mmc_queue_map_sg(mq, mq->mqrq_cur); + if ((md->flags & MMC_BLK_CMD23) && + mmc_op_multi(brq->cmd.opcode) && + (do_rel_wr || !(card->quirks & MMC_QUIRK_BLK_NO_CMD23))) { + brq->sbc.opcode = MMC_SET_BLOCK_COUNT; + brq->sbc.arg = brq->data.blocks | + (do_rel_wr ? (1 << 31) : 0); + brq->sbc.flags = MMC_RSP_R1 | MMC_CMD_AC; + brq->mrq.sbc = &brq->sbc; + }

- /* - * Adjust the sg list so it is the same size as the - * request. - */ - if (brq->data.blocks != blk_rq_sectors(req)) { - int i, data_size = brq->data.blocks << 9; - struct scatterlist *sg; - - for_each_sg(brq->data.sg, sg, brq->data.sg_len, i) { - data_size -= sg->length; - if (data_size <= 0) { - sg->length += data_size; - i++; - break; - } + mmc_set_data_timeout(&brq->data, card); + + brq->data.sg = mqrq->sg; + brq->data.sg_len = mmc_queue_map_sg(mq, mqrq); + + /* + * Adjust the sg list so it is the same size as the + * request. + */ + if (brq->data.blocks != blk_rq_sectors(req)) { + int i, data_size = brq->data.blocks << 9; + struct scatterlist *sg; + + for_each_sg(brq->data.sg, sg, brq->data.sg_len, i) { + data_size -= sg->length; + if (data_size <= 0) { + sg->length += data_size; + i++; + break; } - brq->data.sg_len = i; } + brq->data.sg_len = i; + }

- mmc_queue_bounce_pre(mq->mqrq_cur); + mmc_queue_bounce_pre(mqrq); +} + +static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) +{ + struct mmc_blk_data *md = mq->data; + struct mmc_card *card = md->queue.card; + struct mmc_blk_request *brq = &mq->mqrq_cur->brq; + int ret = 1, disable_multi = 0; + + mmc_claim_host(card->host); + + do { + struct mmc_command cmd; + u32 status = 0;

+ mmc_blk_rw_rq_prep(mq->mqrq_cur, card, disable_multi, mq); mmc_wait_for_req(card->host, &brq->mrq);

mmc_queue_bounce_post(mq->mqrq_cur);

-- 1.7.4.1

Per Forlin

9:17 p.m.

New subject: [PATCH v6 08/11] mmc: move error code in mmc_block_issue_rw_rq to a separate function.

Break out code without functional changes. This simplifies the code and makes way for handle two parallel request.

Signed-off-by: Per Forlin per.forlin@linaro.org --- drivers/mmc/card/block.c | 246 ++++++++++++++++++++++++++------------------- 1 files changed, 142 insertions(+), 104 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 144d435..6a84a75 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -106,6 +106,13 @@ struct mmc_blk_data {

static DEFINE_MUTEX(open_lock);

+enum mmc_blk_status { + MMC_BLK_SUCCESS = 0, + MMC_BLK_RETRY, + MMC_BLK_DATA_ERR, + MMC_BLK_CMD_ERR, +}; + module_param(perdev_minors, int, 0444); MODULE_PARM_DESC(perdev_minors, "Minors numbers to allocate per device");

@@ -661,6 +668,112 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq, } }

+static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq, + struct request *req, + struct mmc_card *card, + struct mmc_blk_data *md) +{ + struct mmc_command cmd; + u32 status = 0; + enum mmc_blk_status ret = MMC_BLK_SUCCESS; + + /* + * Check for errors here, but don't jump to cmd_err + * until later as we need to wait for the card to leave + * programming mode even when things go wrong. + */ + if (brq->sbc.error || brq->cmd.error || + brq->data.error || brq->stop.error) { + if (brq->data.blocks > 1 && rq_data_dir(req) == READ) { + /* Redo read one sector at a time */ + printk(KERN_WARNING "%s: retrying using single " + "block read\n", req->rq_disk->disk_name); + ret = MMC_BLK_RETRY; + goto out; + } + status = get_card_status(card, req); + } + + if (brq->sbc.error) { + printk(KERN_ERR "%s: error %d sending SET_BLOCK_COUNT " + "command, response %#x, card status %#x\n", + req->rq_disk->disk_name, brq->sbc.error, + brq->sbc.resp[0], status); + } + + if (brq->cmd.error) { + printk(KERN_ERR "%s: error %d sending read/write " + "command, response %#x, card status %#x\n", + req->rq_disk->disk_name, brq->cmd.error, + brq->cmd.resp[0], status); + } + + if (brq->data.error) { + if (brq->data.error == -ETIMEDOUT && brq->mrq.stop) + /* 'Stop' response contains card status */ + status = brq->mrq.stop->resp[0]; + printk(KERN_ERR "%s: error %d transferring data," + " sector %u, nr %u, card status %#x\n", + req->rq_disk->disk_name, brq->data.error, + (unsigned)blk_rq_pos(req), + (unsigned)blk_rq_sectors(req), status); + } + + if (brq->stop.error) { + printk(KERN_ERR "%s: error %d sending stop command, " + "response %#x, card status %#x\n", + req->rq_disk->disk_name, brq->stop.error, + brq->stop.resp[0], status); + } + + if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) { + do { + int err; + + cmd.opcode = MMC_SEND_STATUS; + cmd.arg = card->rca << 16; + cmd.flags = MMC_RSP_R1 | MMC_CMD_AC; + err = mmc_wait_for_cmd(card->host, &cmd, 5); + if (err) { + printk(KERN_ERR "%s: error %d requesting status\n", + req->rq_disk->disk_name, err); + ret = MMC_BLK_CMD_ERR; + goto out; + } + /* + * Some cards mishandle the status bits, + * so make sure to check both the busy + * indication and the card state. + */ + } while (!(cmd.resp[0] & R1_READY_FOR_DATA) || + (R1_CURRENT_STATE(cmd.resp[0]) == 7)); + +#if 0 + if (cmd.resp[0] & ~0x00000900) + printk(KERN_ERR "%s: status = %08x\n", + req->rq_disk->disk_name, cmd.resp[0]); + if (mmc_decode_status(cmd.resp)) { + ret = MMC_BLK_CMD_ERR; + goto out; + } +#endif + } + + if (brq->cmd.error || brq->stop.error || brq->data.error) { + if (rq_data_dir(req) == READ) + /* + * After an error, we redo I/O one sector at a + * time, so we only reach here after trying to + * read a single sector. + */ + ret = MMC_BLK_DATA_ERR; + else + ret = MMC_BLK_DATA_ERR; + } +out: + return ret; +} + static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, struct mmc_card *card, int disable_multi, @@ -797,119 +910,44 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) struct mmc_card *card = md->queue.card; struct mmc_blk_request *brq = &mq->mqrq_cur->brq; int ret = 1, disable_multi = 0; - - mmc_claim_host(card->host); + enum mmc_blk_status status;

do { - struct mmc_command cmd; - u32 status = 0; - mmc_blk_rw_rq_prep(mq->mqrq_cur, card, disable_multi, mq); mmc_wait_for_req(card->host, &brq->mrq);

mmc_queue_bounce_post(mq->mqrq_cur); + status = mmc_blk_err_check(brq, req, card, md);

- /* - * Check for errors here, but don't jump to cmd_err - * until later as we need to wait for the card to leave - * programming mode even when things go wrong. - */ - if (brq->sbc.error || brq->cmd.error || - brq->data.error || brq->stop.error) { - if (brq->data.blocks > 1 && rq_data_dir(req) == READ) { - /* Redo read one sector at a time */ - printk(KERN_WARNING "%s: retrying using single " - "block read\n", req->rq_disk->disk_name); - disable_multi = 1; - continue; - } - status = get_card_status(card, req); - } - - if (brq->sbc.error) { - printk(KERN_ERR "%s: error %d sending SET_BLOCK_COUNT " - "command, response %#x, card status %#x\n", - req->rq_disk->disk_name, brq->sbc.error, - brq->sbc.resp[0], status); - } - - if (brq->cmd.error) { - printk(KERN_ERR "%s: error %d sending read/write " - "command, response %#x, card status %#x\n", - req->rq_disk->disk_name, brq->cmd.error, - brq->cmd.resp[0], status); - } - - if (brq->data.error) { - if (brq->data.error == -ETIMEDOUT && brq->mrq.stop) - /* 'Stop' response contains card status */ - status = brq->mrq.stop->resp[0]; - printk(KERN_ERR "%s: error %d transferring data," - " sector %u, nr %u, card status %#x\n", - req->rq_disk->disk_name, brq->data.error, - (unsigned)blk_rq_pos(req), - (unsigned)blk_rq_sectors(req), status); - } - - if (brq->stop.error) { - printk(KERN_ERR "%s: error %d sending stop command, " - "response %#x, card status %#x\n", - req->rq_disk->disk_name, brq->stop.error, - brq->stop.resp[0], status); - } - - if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) { - do { - int err; - - cmd.opcode = MMC_SEND_STATUS; - cmd.arg = card->rca << 16; - cmd.flags = MMC_RSP_R1 | MMC_CMD_AC; - err = mmc_wait_for_cmd(card->host, &cmd, 5); - if (err) { - printk(KERN_ERR "%s: error %d requesting status\n", - req->rq_disk->disk_name, err); - goto cmd_err; - } - /* - * Some cards mishandle the status bits, - * so make sure to check both the busy - * indication and the card state. - */ - } while (!(cmd.resp[0] & R1_READY_FOR_DATA) || - (R1_CURRENT_STATE(cmd.resp[0]) == 7)); - -#if 0 - if (cmd.resp[0] & ~0x00000900) - printk(KERN_ERR "%s: status = %08x\n", - req->rq_disk->disk_name, cmd.resp[0]); - if (mmc_decode_status(cmd.resp)) - goto cmd_err; -#endif - } - - if (brq->cmd.error || brq->stop.error || brq->data.error) { - if (rq_data_dir(req) == READ) { - /* - * After an error, we redo I/O one sector at a - * time, so we only reach here after trying to - * read a single sector. - */ - spin_lock_irq(&md->lock); - ret = __blk_end_request(req, -EIO, - brq->data.blksz); - spin_unlock_irq(&md->lock); - continue; - } + switch (status) { + case MMC_BLK_CMD_ERR: goto cmd_err; - } + break; + case MMC_BLK_RETRY: + disable_multi = 1; + ret = 1; + break; + case MMC_BLK_DATA_ERR: + /* + * After an error, we redo I/O one sector at a + * time, so we only reach here after trying to + * read a single sector. + */ + spin_lock_irq(&md->lock); + ret = __blk_end_request(req, -EIO, + brq->data.blksz); + spin_unlock_irq(&md->lock);

- /* - * A block was successfully transferred. - */ - spin_lock_irq(&md->lock); - ret = __blk_end_request(req, 0, brq->data.bytes_xfered); - spin_unlock_irq(&md->lock); + break; + case MMC_BLK_SUCCESS: + /* + * A block was successfully transferred. + */ + spin_lock_irq(&md->lock); + ret = __blk_end_request(req, 0, brq->data.bytes_xfered); + spin_unlock_irq(&md->lock); + break; + } } while (ret);

return 1;

-- 1.7.4.1

Per Forlin

9:17 p.m.

New subject: [PATCH v6 09/11] mmc: add a second mmc queue request member

Add an additional mmc queue request instance to make way for two active block requests. One request may be active while the other request is being prepared.

Signed-off-by: Per Forlin per.forlin@linaro.org --- drivers/mmc/card/queue.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- drivers/mmc/card/queue.h | 3 ++- 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c index 81d0eef..0757a39 100644 --- a/drivers/mmc/card/queue.c +++ b/drivers/mmc/card/queue.c @@ -130,6 +130,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock u64 limit = BLK_BOUNCE_HIGH; int ret; struct mmc_queue_req *mqrq_cur = &mq->mqrq[0]; + struct mmc_queue_req *mqrq_prev = &mq->mqrq[1];

if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask) limit = *mmc_dev(host)->dma_mask; @@ -140,7 +141,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock return -ENOMEM;

memset(&mq->mqrq_cur, 0, sizeof(mq->mqrq_cur)); + memset(&mq->mqrq_prev, 0, sizeof(mq->mqrq_prev)); mq->mqrq_cur = mqrq_cur; + mq->mqrq_prev = mqrq_prev; mq->queue->queuedata = mq;

blk_queue_prep_rq(mq->queue, mmc_prep_request); @@ -181,9 +184,17 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock "allocate bounce cur buffer\n", mmc_card_name(card)); } + mqrq_prev->bounce_buf = kmalloc(bouncesz, GFP_KERNEL); + if (!mqrq_prev->bounce_buf) { + printk(KERN_WARNING "%s: unable to " + "allocate bounce prev buffer\n", + mmc_card_name(card)); + kfree(mqrq_cur->bounce_buf); + mqrq_cur->bounce_buf = NULL; + } }

- if (mqrq_cur->bounce_buf) { + if (mqrq_cur->bounce_buf && mqrq_prev->bounce_buf) { blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_ANY); blk_queue_max_hw_sectors(mq->queue, bouncesz / 512); blk_queue_max_segments(mq->queue, bouncesz / 512); @@ -198,11 +209,19 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock if (ret) goto cleanup_queue;

+ mqrq_prev->sg = mmc_alloc_sg(1, &ret); + if (ret) + goto cleanup_queue; + + mqrq_prev->bounce_sg = + mmc_alloc_sg(bouncesz / 512, &ret); + if (ret) + goto cleanup_queue; } } #endif

- if (!mqrq_cur->bounce_buf) { + if (!mqrq_cur->bounce_buf && !mqrq_prev->bounce_buf) { blk_queue_bounce_limit(mq->queue, limit); blk_queue_max_hw_sectors(mq->queue, min(host->max_blk_count, host->max_req_size / 512)); @@ -213,6 +232,10 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock if (ret) goto cleanup_queue;

+ + mqrq_prev->sg = mmc_alloc_sg(host->max_segs, &ret); + if (ret) + goto cleanup_queue; }

sema_init(&mq->thread_sem, 1); @@ -229,6 +252,8 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock free_bounce_sg: kfree(mqrq_cur->bounce_sg); mqrq_cur->bounce_sg = NULL; + kfree(mqrq_prev->bounce_sg); + mqrq_prev->bounce_sg = NULL;

cleanup_queue: kfree(mqrq_cur->sg); @@ -236,6 +261,11 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock kfree(mqrq_cur->bounce_buf); mqrq_cur->bounce_buf = NULL;

+ kfree(mqrq_prev->sg); + mqrq_prev->sg = NULL; + kfree(mqrq_prev->bounce_buf); + mqrq_prev->bounce_buf = NULL; + blk_cleanup_queue(mq->queue); return ret; } @@ -245,6 +275,7 @@ void mmc_cleanup_queue(struct mmc_queue *mq) struct request_queue *q = mq->queue; unsigned long flags; struct mmc_queue_req *mqrq_cur = mq->mqrq_cur; + struct mmc_queue_req *mqrq_prev = mq->mqrq_prev;

/* Make sure the queue isn't suspended, as that will deadlock */ mmc_queue_resume(mq); @@ -267,6 +298,15 @@ void mmc_cleanup_queue(struct mmc_queue *mq) kfree(mqrq_cur->bounce_buf); mqrq_cur->bounce_buf = NULL;

+ kfree(mqrq_prev->bounce_sg); + mqrq_prev->bounce_sg = NULL; + + kfree(mqrq_prev->sg); + mqrq_prev->sg = NULL; + + kfree(mqrq_prev->bounce_buf); + mqrq_prev->bounce_buf = NULL; + mq->card = NULL; } EXPORT_SYMBOL(mmc_cleanup_queue); diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h index a1defed..c8fb2dc 100644 --- a/drivers/mmc/card/queue.h +++ b/drivers/mmc/card/queue.h @@ -29,8 +29,9 @@ struct mmc_queue { int (*issue_fn)(struct mmc_queue *, struct request *); void *data; struct request_queue *queue; - struct mmc_queue_req mqrq[1]; + struct mmc_queue_req mqrq[2]; struct mmc_queue_req *mqrq_cur; + struct mmc_queue_req *mqrq_prev; };

extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *);

-- 1.7.4.1

Per Forlin

9:17 p.m.

New subject: [PATCH v6 10/11] mmc: test: add random fault injection in core.c

This simple fault injection proved to be very useful to test the error handling in the block.c rw_rq(). It may still be useful to test if the host driver handle pre_req() and post_req() correctly in case of errors.

Signed-off-by: Per Forlin per.forlin@linaro.org --- drivers/mmc/core/core.c | 54 ++++++++++++++++++++++++++++++++++++++++++++ drivers/mmc/core/debugfs.c | 5 ++++ include/linux/mmc/host.h | 3 ++ lib/Kconfig.debug | 11 +++++++++ 4 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 331c69e..e841ff5 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -23,6 +23,8 @@ #include <linux/log2.h> #include <linux/regulator/consumer.h> #include <linux/pm_runtime.h> +#include <linux/fault-inject.h> +#include <linux/random.h>

#include <linux/mmc/card.h> #include <linux/mmc/host.h> @@ -82,6 +84,56 @@ static void mmc_flush_scheduled_work(void) flush_workqueue(workqueue); }

+#ifdef CONFIG_FAIL_MMC_REQUEST + +static DECLARE_FAULT_ATTR(fail_mmc_request); + +static int __init setup_fail_mmc_request(char *str) +{ + return setup_fault_attr(&fail_mmc_request, str); +} +__setup("fail_mmc_request=", setup_fail_mmc_request); + +static void mmc_should_fail_request(struct mmc_host *host, + struct mmc_request *mrq) +{ + struct mmc_command *cmd = mrq->cmd; + struct mmc_data *data = mrq->data; + static const int data_errors[] = { + -ETIMEDOUT, + -EILSEQ, + -EIO, + }; + + if (!data) + return; + + if (cmd->error || data->error || !host->make_it_fail || + !should_fail(&fail_mmc_request, data->blksz * data->blocks)) + return; + + data->error = data_errors[random32() % ARRAY_SIZE(data_errors)]; + data->bytes_xfered = (random32() % (data->bytes_xfered >> 9)) << 9; +} + +static int __init fail_mmc_request_debugfs(void) +{ + return init_fault_attr_dentries(&fail_mmc_request, + "fail_mmc_request"); +} + +late_initcall(fail_mmc_request_debugfs); + +#else /* CONFIG_FAIL_MMC_REQUEST */ + +static void mmc_should_fail_request(struct mmc_host *host, + struct mmc_request *mrq) +{ +} + +#endif /* CONFIG_FAIL_MMC_REQUEST */ + + /** * mmc_request_done - finish processing an MMC request * @host: MMC host which completed request @@ -108,6 +160,8 @@ void mmc_request_done(struct mmc_host *host, struct mmc_request *mrq) cmd->error = 0; host->ops->request(host, mrq); } else { + mmc_should_fail_request(host, mrq); + led_trigger_event(host->led, LED_OFF);

pr_debug("%s: req done (CMD%u): %d: %08x %08x %08x %08x\n", diff --git a/drivers/mmc/core/debugfs.c b/drivers/mmc/core/debugfs.c index 998797e..588e76f 100644 --- a/drivers/mmc/core/debugfs.c +++ b/drivers/mmc/core/debugfs.c @@ -188,6 +188,11 @@ void mmc_add_host_debugfs(struct mmc_host *host) root, &host->clk_delay)) goto err_node; #endif +#ifdef CONFIG_FAIL_MMC_REQUEST + if (!debugfs_create_u8("make-it-fail", S_IRUSR | S_IWUSR, + root, &host->make_it_fail)) + goto err_node; +#endif return;

err_node: diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index 59db6f2..0d0a48f 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -301,6 +301,9 @@ struct mmc_host {

struct mmc_async_req *areq; /* active async req */

+#ifdef CONFIG_FAIL_MMC_REQUEST + u8 make_it_fail; +#endif unsigned long private[0] ____cacheline_aligned; };

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c768bcd..330fc70 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1057,6 +1057,17 @@ config FAIL_IO_TIMEOUT Only works with drivers that use the generic timeout handling, for others it wont do anything.

+config FAIL_MMC_REQUEST + bool "Fault-injection capability for MMC IO" + select DEBUG_FS + depends on FAULT_INJECTION + help + Provide fault-injection capability for MMC IO. + This will make the mmc core return data errors. This is + useful for testing the error handling in the mmc block device + and how the mmc host driver handle retries from + the block device. + config FAULT_INJECTION_DEBUG_FS bool "Debugfs entries for fault-injection capabilities" depends on FAULT_INJECTION && SYSFS && DEBUG_FS

-- 1.7.4.1

Per Forlin

9:17 p.m.

New subject: [PATCH v6 11/11] mmc: add handling for two parallel block requests in issue_rw_rq

Change mmc_blk_issue_rw_rq() to become asynchronous. The execution flow looks like this: The mmc-queue calls issue_rw_rq(), which sends the request to the host and returns back to the mmc-queue. The mmc-queue calls issue_rw_rq() again with a new request. This new request is prepared, in isuue_rw_rq(), then it waits for the active request to complete before pushing it to the host. When to mmc-queue is empty it will call isuue_rw_rq() with req=NULL to finish off the active request without starting a new request.

Signed-off-by: Per Forlin per.forlin@linaro.org --- drivers/mmc/card/block.c | 121 +++++++++++++++++++++++++++++++++------------- drivers/mmc/card/queue.c | 17 +++++-- drivers/mmc/card/queue.h | 1 + 3 files changed, 101 insertions(+), 38 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 6a84a75..66db77a 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -108,6 +108,7 @@ static DEFINE_MUTEX(open_lock);

enum mmc_blk_status { MMC_BLK_SUCCESS = 0, + MMC_BLK_PARTIAL, MMC_BLK_RETRY, MMC_BLK_DATA_ERR, MMC_BLK_CMD_ERR, @@ -668,14 +669,16 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq, } }

-static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq, - struct request *req, - struct mmc_card *card, - struct mmc_blk_data *md) +static int mmc_blk_err_check(struct mmc_card *card, + struct mmc_async_req *areq) { struct mmc_command cmd; u32 status = 0; enum mmc_blk_status ret = MMC_BLK_SUCCESS; + struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req, + mmc_active); + struct mmc_blk_request *brq = &mq_mrq->brq; + struct request *req = mq_mrq->req;

/* * Check for errors here, but don't jump to cmd_err @@ -770,7 +773,11 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq, else ret = MMC_BLK_DATA_ERR; } -out: + + if (ret == MMC_BLK_SUCCESS && + blk_rq_bytes(req) != brq->data.bytes_xfered) + ret = MMC_BLK_PARTIAL; + out: return ret; }

@@ -901,27 +908,59 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, brq->data.sg_len = i; }

+ mqrq->mmc_active.mrq = &brq->mrq; + mqrq->mmc_active.err_check = mmc_blk_err_check; + mmc_queue_bounce_pre(mqrq); }

-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) +static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc) { struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card; - struct mmc_blk_request *brq = &mq->mqrq_cur->brq; - int ret = 1, disable_multi = 0; + struct mmc_blk_request *brq; + int ret = 1; + int disable_multi = 0; enum mmc_blk_status status; + struct mmc_queue_req *mq_rq; + struct request *req; + struct mmc_async_req *areq; + + if (!rqc && !mq->mqrq_prev->req) + goto out;

do { - mmc_blk_rw_rq_prep(mq->mqrq_cur, card, disable_multi, mq); - mmc_wait_for_req(card->host, &brq->mrq); + if (rqc) { + mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq); + areq = &mq->mqrq_cur->mmc_active; + } else + areq = NULL; + areq = mmc_start_req(card->host, areq, (int *) &status); + if (!areq) + goto out;

- mmc_queue_bounce_post(mq->mqrq_cur); - status = mmc_blk_err_check(brq, req, card, md); + mq_rq = container_of(areq, struct mmc_queue_req, mmc_active); + brq = &mq_rq->brq; + req = mq_rq->req; + mmc_queue_bounce_post(mq_rq);

switch (status) { - case MMC_BLK_CMD_ERR: - goto cmd_err; + case MMC_BLK_SUCCESS: + case MMC_BLK_PARTIAL: + /* + * A block was successfully transferred. + */ + spin_lock_irq(&md->lock); + ret = __blk_end_request(req, 0, + brq->data.bytes_xfered); + spin_unlock_irq(&md->lock); + if (status == MMC_BLK_SUCCESS && ret) { + /* If this happen it is a bug */ + printk(KERN_ERR "%s BUG rq_tot %d d_xfer %d\n", + __func__, blk_rq_bytes(req), + brq->data.bytes_xfered); + goto cmd_err; + } break; case MMC_BLK_RETRY: disable_multi = 1; @@ -934,36 +973,41 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) * read a single sector. */ spin_lock_irq(&md->lock); - ret = __blk_end_request(req, -EIO, - brq->data.blksz); + ret = __blk_end_request(req, -EIO, brq->data.blksz); spin_unlock_irq(&md->lock); - + if (!ret) + goto start_new_req; break; - case MMC_BLK_SUCCESS: + case MMC_BLK_CMD_ERR: + ret = 1; + goto cmd_err; + break; + } + + if (ret) { /* - * A block was successfully transferred. + * In case of a none complete request + * prepare it again and resend. */ - spin_lock_irq(&md->lock); - ret = __blk_end_request(req, 0, brq->data.bytes_xfered); - spin_unlock_irq(&md->lock); - break; + mmc_blk_rw_rq_prep(mq_rq, card, disable_multi, mq); + mmc_start_req(card->host, &mq_rq->mmc_active, NULL); } } while (ret);

return 1; - + out: + return 0; cmd_err: - /* - * If this is an SD card and we're writing, we can first - * mark the known good sectors as ok. - * + /* + * If this is an SD card and we're writing, we can first + * mark the known good sectors as ok. + * * If the card is not SD, we can still ok written sectors * as reported by the controller (which might be less than * the real number of written sectors, but never more). */ if (mmc_card_sd(card)) { u32 blocks; - blocks = mmc_sd_num_wr_blocks(card); if (blocks != (u32)-1) { spin_lock_irq(&md->lock); @@ -981,6 +1025,12 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) ret = __blk_end_request(req, -EIO, blk_rq_cur_bytes(req)); spin_unlock_irq(&md->lock);

+ start_new_req: + if (rqc) { + mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq); + mmc_start_req(card->host, &mq->mqrq_cur->mmc_active, NULL); + } + return 0; }

@@ -990,26 +1040,31 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card;

- mmc_claim_host(card->host); + if (req && !mq->mqrq_prev->req) + /* claim host only for the first request */ + mmc_claim_host(card->host); + ret = mmc_blk_part_switch(card, md); if (ret) { ret = 0; goto out; }

- if (req->cmd_flags & REQ_DISCARD) { + if (req && req->cmd_flags & REQ_DISCARD) { if (req->cmd_flags & REQ_SECURE) ret = mmc_blk_issue_secdiscard_rq(mq, req); else ret = mmc_blk_issue_discard_rq(mq, req); - } else if (req->cmd_flags & REQ_FLUSH) { + } else if (req && req->cmd_flags & REQ_FLUSH) { ret = mmc_blk_issue_flush(mq, req); } else { ret = mmc_blk_issue_rw_rq(mq, req); }

out: - mmc_release_host(card->host); + if (!req) + /* release host only when there are no more requests */ + mmc_release_host(card->host); return ret; }

diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c index 0757a39..2d6696a 100644 --- a/drivers/mmc/card/queue.c +++ b/drivers/mmc/card/queue.c @@ -52,6 +52,7 @@ static int mmc_queue_thread(void *d) down(&mq->thread_sem); do { struct request *req = NULL; + struct mmc_queue_req *tmp;

spin_lock_irq(q->queue_lock); set_current_state(TASK_INTERRUPTIBLE); @@ -59,7 +60,10 @@ static int mmc_queue_thread(void *d) mq->mqrq_cur->req = req; spin_unlock_irq(q->queue_lock);

- if (!req) { + if (req || mq->mqrq_prev->req) { + set_current_state(TASK_RUNNING); + mq->issue_fn(mq, req); + } else { if (kthread_should_stop()) { set_current_state(TASK_RUNNING); break; @@ -67,11 +71,14 @@ static int mmc_queue_thread(void *d) up(&mq->thread_sem); schedule(); down(&mq->thread_sem); - continue; } - set_current_state(TASK_RUNNING);

- mq->issue_fn(mq, req); + /* Current request becomes previous request and vice versa. */ + mq->mqrq_prev->brq.mrq.data = NULL; + mq->mqrq_prev->req = NULL; + tmp = mq->mqrq_prev; + mq->mqrq_prev = mq->mqrq_cur; + mq->mqrq_cur = tmp; } while (1); up(&mq->thread_sem);

@@ -97,7 +104,7 @@ static void mmc_request(struct request_queue *q) return; }

- if (!mq->mqrq_cur->req) + if (!mq->mqrq_cur->req && !mq->mqrq_prev->req) wake_up_process(mq->thread); }

diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h index c8fb2dc..f937538 100644 --- a/drivers/mmc/card/queue.h +++ b/drivers/mmc/card/queue.h @@ -19,6 +19,7 @@ struct mmc_queue_req { char *bounce_buf; struct scatterlist *bounce_sg; unsigned int bounce_sg_len; + struct mmc_async_req mmc_active; };

struct mmc_queue {

-- 1.7.4.1

Kishore Kadiyala

20 Jun 20 Jun

3:17 p.m.

New subject: [PATCH v6 11/11] mmc: add handling for two parallel block requests in issue_rw_rq

On Mon, Jun 20, 2011 at 2:47 AM, Per Forlin per.forlin@linaro.org wrote:

...

Change mmc_blk_issue_rw_rq() to become asynchronous. The execution flow looks like this: The mmc-queue calls issue_rw_rq(), which sends the request to the host and returns back to the mmc-queue. The mmc-queue calls issue_rw_rq() again with a new request. This new request is prepared, in isuue_rw_rq(), then it waits for the active request to complete before pushing it to the host. When to mmc-queue is empty it will call isuue_rw_rq() with req=NULL to finish off the active request without starting a new request.

Signed-off-by: Per Forlin per.forlin@linaro.org

drivers/mmc/card/block.c | 121 +++++++++++++++++++++++++++++++++------------- drivers/mmc/card/queue.c | 17 +++++-- drivers/mmc/card/queue.h | 1 + 3 files changed, 101 insertions(+), 38 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 6a84a75..66db77a 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -108,6 +108,7 @@ static DEFINE_MUTEX(open_lock);

enum mmc_blk_status { MMC_BLK_SUCCESS = 0,

MMC_BLK_PARTIAL,

MMC_BLK_RETRY, MMC_BLK_DATA_ERR, MMC_BLK_CMD_ERR, @@ -668,14 +669,16 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq, } }

-static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq,

struct request *req,

struct mmc_card *card,

struct mmc_blk_data *md)

+static int mmc_blk_err_check(struct mmc_card *card,

struct mmc_async_req *areq)

{ struct mmc_command cmd; u32 status = 0; enum mmc_blk_status ret = MMC_BLK_SUCCESS;

struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,

mmc_active);

struct mmc_blk_request *brq = &mq_mrq->brq;

struct request *req = mq_mrq->req;

/* * Check for errors here, but don't jump to cmd_err @@ -770,7 +773,11 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq, else ret = MMC_BLK_DATA_ERR; } -out:

if (ret == MMC_BLK_SUCCESS &&

blk_rq_bytes(req) != brq->data.bytes_xfered)

ret = MMC_BLK_PARTIAL;

out:

return ret; }

@@ -901,27 +908,59 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, brq->data.sg_len = i; }

mqrq->mmc_active.mrq = &brq->mrq;

mqrq->mmc_active.err_check = mmc_blk_err_check;

mmc_queue_bounce_pre(mqrq); }

-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) +static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc) { struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card;

struct mmc_blk_request *brq = &mq->mqrq_cur->brq;

int ret = 1, disable_multi = 0;

struct mmc_blk_request *brq;

int ret = 1;

int disable_multi = 0;

enum mmc_blk_status status;

struct mmc_queue_req *mq_rq;

struct request *req;

struct mmc_async_req *areq;

if (!rqc && !mq->mqrq_prev->req)

goto out;

do {

mmc_blk_rw_rq_prep(mq->mqrq_cur, card, disable_multi, mq);

mmc_wait_for_req(card->host, &brq->mrq);

if (rqc) {

mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);

areq = &mq->mqrq_cur->mmc_active;

} else

areq = NULL;

areq = mmc_start_req(card->host, areq, (int *) &status);

I think 'status' is used uninitialized. With this struct mmc_async_req *mmc_start_req in your first patch if (error) *error = err; return data; condition which always passes.

You can have enum mmc_blk_status status = MMC_BLK_SUCCESS;

struct mmc_async_req *mmc_start_req { err = host->areq->err_check(host->card, host->areq); if (err) { ... ... *error = err; }

no need to update * error here in success case return data }

Regards, Kishore

Per Forlin

21 Jun 21 Jun

6:40 a.m.

New subject: [PATCH v6 11/11] mmc: add handling for two parallel block requests in issue_rw_rq

On 20 June 2011 17:17, Kishore Kadiyala kishorek.kadiyala@gmail.com wrote:

...

On Mon, Jun 20, 2011 at 2:47 AM, Per Forlin per.forlin@linaro.org wrote:

...
Change mmc_blk_issue_rw_rq() to become asynchronous. The execution flow looks like this: The mmc-queue calls issue_rw_rq(), which sends the request to the host and returns back to the mmc-queue. The mmc-queue calls issue_rw_rq() again with a new request. This new request is prepared, in isuue_rw_rq(), then it waits for the active request to complete before pushing it to the host. When to mmc-queue is empty it will call isuue_rw_rq() with req=NULL to finish off the active request without starting a new request.

Signed-off-by: Per Forlin per.forlin@linaro.org

drivers/mmc/card/block.c | 121 +++++++++++++++++++++++++++++++++------------- drivers/mmc/card/queue.c | 17 +++++-- drivers/mmc/card/queue.h | 1 + 3 files changed, 101 insertions(+), 38 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 6a84a75..66db77a 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -108,6 +108,7 @@ static DEFINE_MUTEX(open_lock);

enum mmc_blk_status { MMC_BLK_SUCCESS = 0,

MMC_BLK_PARTIAL,

MMC_BLK_RETRY, MMC_BLK_DATA_ERR, MMC_BLK_CMD_ERR, @@ -668,14 +669,16 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq, } }

-static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq,

struct request *req,

struct mmc_card *card,

struct mmc_blk_data *md)

+static int mmc_blk_err_check(struct mmc_card *card,

struct mmc_async_req *areq)

{ struct mmc_command cmd; u32 status = 0; enum mmc_blk_status ret = MMC_BLK_SUCCESS;

struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,

mmc_active);

struct mmc_blk_request *brq = &mq_mrq->brq;

struct request *req = mq_mrq->req;

/* * Check for errors here, but don't jump to cmd_err @@ -770,7 +773,11 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq, else ret = MMC_BLK_DATA_ERR; } -out:

if (ret == MMC_BLK_SUCCESS &&

blk_rq_bytes(req) != brq->data.bytes_xfered)

ret = MMC_BLK_PARTIAL;

out:

return ret; }

@@ -901,27 +908,59 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, brq->data.sg_len = i; }

mqrq->mmc_active.mrq = &brq->mrq;

mqrq->mmc_active.err_check = mmc_blk_err_check;

mmc_queue_bounce_pre(mqrq); }

-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) +static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc) { struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card;

struct mmc_blk_request *brq = &mq->mqrq_cur->brq;

int ret = 1, disable_multi = 0;

struct mmc_blk_request *brq;

int ret = 1;

int disable_multi = 0;

enum mmc_blk_status status;

struct mmc_queue_req *mq_rq;

struct request *req;

struct mmc_async_req *areq;

if (!rqc && !mq->mqrq_prev->req)

goto out;

do {

mmc_blk_rw_rq_prep(mq->mqrq_cur, card, disable_multi, mq);

mmc_wait_for_req(card->host, &brq->mrq);

if (rqc) {

mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);

areq = &mq->mqrq_cur->mmc_active;

} else

areq = NULL;

areq = mmc_start_req(card->host, areq, (int *) &status);

I think 'status' is used uninitialized.

status is an out parameter. From that perspective it is always initialised. I should update the doc description of mmc_start_req to clarify this.

...

With this struct mmc_async_req *mmc_start_req in your first patch if (error) *error = err; return data; condition which always passes.

You can have enum mmc_blk_status status = MMC_BLK_SUCCESS;

struct mmc_async_req *mmc_start_req { err = host->areq->err_check(host->card, host->areq); if (err) { ... ... *error = err; }

no need to update * error here in success case return data }

I agree, makes the code more clear.

Thanks, Per

Per Forlin

7:05 a.m.

New subject: [PATCH v6 11/11] mmc: add handling for two parallel block requests in issue_rw_rq

On 21 June 2011 08:40, Per Forlin per.forlin@linaro.org wrote:

...

On 20 June 2011 17:17, Kishore Kadiyala kishorek.kadiyala@gmail.com wrote:

...
On Mon, Jun 20, 2011 at 2:47 AM, Per Forlin per.forlin@linaro.org wrote:

...
Change mmc_blk_issue_rw_rq() to become asynchronous. The execution flow looks like this: The mmc-queue calls issue_rw_rq(), which sends the request to the host and returns back to the mmc-queue. The mmc-queue calls issue_rw_rq() again with a new request. This new request is prepared, in isuue_rw_rq(), then it waits for the active request to complete before pushing it to the host. When to mmc-queue is empty it will call isuue_rw_rq() with req=NULL to finish off the active request without starting a new request.

Signed-off-by: Per Forlin per.forlin@linaro.org

drivers/mmc/card/block.c | 121 +++++++++++++++++++++++++++++++++------------- drivers/mmc/card/queue.c | 17 +++++-- drivers/mmc/card/queue.h | 1 + 3 files changed, 101 insertions(+), 38 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 6a84a75..66db77a 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -108,6 +108,7 @@ static DEFINE_MUTEX(open_lock);

enum mmc_blk_status { MMC_BLK_SUCCESS = 0,

MMC_BLK_PARTIAL,

MMC_BLK_RETRY, MMC_BLK_DATA_ERR, MMC_BLK_CMD_ERR, @@ -668,14 +669,16 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq, } }

-static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq,

struct request *req,

struct mmc_card *card,

struct mmc_blk_data *md)

+static int mmc_blk_err_check(struct mmc_card *card,

struct mmc_async_req *areq)

{ struct mmc_command cmd; u32 status = 0; enum mmc_blk_status ret = MMC_BLK_SUCCESS;

struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,

mmc_active);

struct mmc_blk_request *brq = &mq_mrq->brq;

struct request *req = mq_mrq->req;

/* * Check for errors here, but don't jump to cmd_err @@ -770,7 +773,11 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq, else ret = MMC_BLK_DATA_ERR; } -out:

if (ret == MMC_BLK_SUCCESS &&

blk_rq_bytes(req) != brq->data.bytes_xfered)

ret = MMC_BLK_PARTIAL;

out:

return ret; }

@@ -901,27 +908,59 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, brq->data.sg_len = i; }

mqrq->mmc_active.mrq = &brq->mrq;

mqrq->mmc_active.err_check = mmc_blk_err_check;

mmc_queue_bounce_pre(mqrq); }

-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) +static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc) { struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card;

struct mmc_blk_request *brq = &mq->mqrq_cur->brq;

int ret = 1, disable_multi = 0;

struct mmc_blk_request *brq;

int ret = 1;

int disable_multi = 0;

enum mmc_blk_status status;

struct mmc_queue_req *mq_rq;

struct request *req;

struct mmc_async_req *areq;

if (!rqc && !mq->mqrq_prev->req)

goto out;

do {

mmc_blk_rw_rq_prep(mq->mqrq_cur, card, disable_multi, mq);

mmc_wait_for_req(card->host, &brq->mrq);

if (rqc) {

mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);

areq = &mq->mqrq_cur->mmc_active;

} else

areq = NULL;

areq = mmc_start_req(card->host, areq, (int *) &status);

I think 'status' is used uninitialized.

status is an out parameter. From that perspective it is always initialised. I should update the doc description of mmc_start_req to clarify this.

...
With this struct mmc_async_req *mmc_start_req in your first patch if (error) *error = err; return data; condition which always passes.

You can have enum mmc_blk_status status = MMC_BLK_SUCCESS;

In core.c there is no access to the type "enum mmc_blk_status status", this type is block.c specific. The intention is to make this function available for SDIO as well. "int err = 0;" is set at the top of mmc_start_req(). Default err condition is 0.

What do you think about the following changes?

* @areq: async request to start - * @error: non zero in case of error + * @error: out parameter returns 0 for success, otherwise non zero * * Start a new MMC custom command request for a host. * If there is on ongoing async request wait for completion @@ -334,9 +334,7 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host, mmc_post_req(host, areq->mrq, -EINVAL);

host->areq = NULL; - if (error) - *error = err; - return data; + goto out; } }

@@ -347,6 +345,7 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host, mmc_post_req(host, host->areq->mrq, 0);

host->areq = areq; + out: if (error) *error = err; return data;

...

...
struct mmc_async_req *mmc_start_req { err = host->areq->err_check(host->card, host->areq); if (err) { ... ... *error = err; }

no need to update * error here in success case return data }

I agree, makes the code more clear.

Thanks, Per

Kishore Kadiyala

1:52 p.m.

New subject: [PATCH v6 11/11] mmc: add handling for two parallel block requests in issue_rw_rq

Hi Per,

<snip>

...

...
...
...
--- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -108,6 +108,7 @@ static DEFINE_MUTEX(open_lock);

enum mmc_blk_status { MMC_BLK_SUCCESS = 0,

MMC_BLK_PARTIAL,

MMC_BLK_RETRY, MMC_BLK_DATA_ERR, MMC_BLK_CMD_ERR,

<snip>

...

...
...
...
-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) +static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc) { struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card;

struct mmc_blk_request *brq = &mq->mqrq_cur->brq;

int ret = 1, disable_multi = 0;

struct mmc_blk_request *brq;

int ret = 1;

int disable_multi = 0;

enum mmc_blk_status status;

Can initialize here enum mmc_blk_status = MMC_BLK_SUCCESS

...

...
...
...

struct mmc_queue_req *mq_rq;

struct request *req;

struct mmc_async_req *areq;

if (!rqc && !mq->mqrq_prev->req)

goto out;

<snip>

I meant doing initialization in block.c itself and not core.c as above.

...

The intention is to make this function available for SDIO as well.

Totally agree, having the API access to SDIO

...

"int err = 0;" is set at the top of mmc_start_req(). Default err condition is 0.

What do you think about the following changes?

* @areq: async request to start

@error: non zero in case of error

@error: out parameter returns 0 for success, otherwise non zero

* * Start a new MMC custom command request for a host. * If there is on ongoing async request wait for completion @@ -334,9 +334,7 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host, mmc_post_req(host, areq->mrq, -EINVAL);

host->areq = NULL;

if (error)

*error = err;

return data;

goto out;

} }

@@ -347,6 +345,7 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host, mmc_post_req(host, host->areq->mrq, 0);

host->areq = areq;

out:

if (error) *error = err; return data;

The above change reduced the code size but still since 'error' is pointer to the 'status' which is uninitialized, so if(error) will be always true. If you want to update *error in success/failure case [based on 'err' ]then can remove the if(error) check else to update the error case only can look at the way proposed in my previous mail.

Regards, Kishore

Per Forlin

9:01 p.m.

New subject: [PATCH v6 11/11] mmc: add handling for two parallel block requests in issue_rw_rq

On 20 June 2011 17:17, Kishore Kadiyala kishorek.kadiyala@gmail.com wrote:

...

On Mon, Jun 20, 2011 at 2:47 AM, Per Forlin per.forlin@linaro.org wrote:

...
Change mmc_blk_issue_rw_rq() to become asynchronous. The execution flow looks like this: The mmc-queue calls issue_rw_rq(), which sends the request to the host and returns back to the mmc-queue. The mmc-queue calls issue_rw_rq() again with a new request. This new request is prepared, in isuue_rw_rq(), then it waits for the active request to complete before pushing it to the host. When to mmc-queue is empty it will call isuue_rw_rq() with req=NULL to finish off the active request without starting a new request.

Signed-off-by: Per Forlin per.forlin@linaro.org

drivers/mmc/card/block.c | 121 +++++++++++++++++++++++++++++++++------------- drivers/mmc/card/queue.c | 17 +++++-- drivers/mmc/card/queue.h | 1 + 3 files changed, 101 insertions(+), 38 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 6a84a75..66db77a 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -108,6 +108,7 @@ static DEFINE_MUTEX(open_lock);

enum mmc_blk_status { MMC_BLK_SUCCESS = 0,

MMC_BLK_PARTIAL,

MMC_BLK_RETRY, MMC_BLK_DATA_ERR, MMC_BLK_CMD_ERR, @@ -668,14 +669,16 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq, } }

-static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq,

struct request *req,

struct mmc_card *card,

struct mmc_blk_data *md)

+static int mmc_blk_err_check(struct mmc_card *card,

struct mmc_async_req *areq)

{ struct mmc_command cmd; u32 status = 0; enum mmc_blk_status ret = MMC_BLK_SUCCESS;

struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,

mmc_active);

struct mmc_blk_request *brq = &mq_mrq->brq;

struct request *req = mq_mrq->req;

/* * Check for errors here, but don't jump to cmd_err @@ -770,7 +773,11 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq, else ret = MMC_BLK_DATA_ERR; } -out:

if (ret == MMC_BLK_SUCCESS &&

blk_rq_bytes(req) != brq->data.bytes_xfered)

ret = MMC_BLK_PARTIAL;

out:

return ret; }

@@ -901,27 +908,59 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, brq->data.sg_len = i; }

mqrq->mmc_active.mrq = &brq->mrq;

mqrq->mmc_active.err_check = mmc_blk_err_check;

mmc_queue_bounce_pre(mqrq); }

-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) +static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc) { struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card;

struct mmc_blk_request *brq = &mq->mqrq_cur->brq;

int ret = 1, disable_multi = 0;

struct mmc_blk_request *brq;

int ret = 1;

int disable_multi = 0;

enum mmc_blk_status status;

struct mmc_queue_req *mq_rq;

struct request *req;

struct mmc_async_req *areq;

if (!rqc && !mq->mqrq_prev->req)

goto out;

do {

mmc_blk_rw_rq_prep(mq->mqrq_cur, card, disable_multi, mq);

mmc_wait_for_req(card->host, &brq->mrq);

if (rqc) {

mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);

areq = &mq->mqrq_cur->mmc_active;

} else

areq = NULL;

areq = mmc_start_req(card->host, areq, (int *) &status);

I think 'status' is used uninitialized. With this struct mmc_async_req *mmc_start_req in your first patch if (error) *error = err; return data; condition which always passes.

It's valid to pass in NULL as status. Do you suggest to make the status pointer mandatory?

...

You can have enum mmc_blk_status status = MMC_BLK_SUCCESS;

status will be set to MMC_BLK_SUCCESS for the first iteration of the do-while loop. It may look like: MMC_BLK_RETRY MMC_BLK_PARTIAL MMC_BLK_PARTIAL MMC_BLK_PARTIAL MMC_BLK_SUCCESS

This means mmc_start_req needs to return MMC_BLK_SUCCESS.

Regards, Per

Per Forlin

7:14 a.m.

New subject: [PATCH v6 11/11] mmc: add handling for two parallel block requests in issue_rw_rq

On 19 June 2011 23:17, Per Forlin per.forlin@linaro.org wrote:

...

Change mmc_blk_issue_rw_rq() to become asynchronous. The execution flow looks like this: The mmc-queue calls issue_rw_rq(), which sends the request to the host and returns back to the mmc-queue. The mmc-queue calls issue_rw_rq() again with a new request. This new request is prepared, in isuue_rw_rq(), then it waits for the active request to complete before pushing it to the host. When to mmc-queue is empty it will call isuue_rw_rq() with req=NULL to finish off the active request without starting a new request.

Signed-off-by: Per Forlin per.forlin@linaro.org

drivers/mmc/card/block.c | 121 +++++++++++++++++++++++++++++++++------------- drivers/mmc/card/queue.c | 17 +++++-- drivers/mmc/card/queue.h | 1 + 3 files changed, 101 insertions(+), 38 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 6a84a75..66db77a 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -108,6 +108,7 @@ static DEFINE_MUTEX(open_lock);

enum mmc_blk_status { MMC_BLK_SUCCESS = 0,

MMC_BLK_PARTIAL,

MMC_BLK_RETRY, MMC_BLK_DATA_ERR, MMC_BLK_CMD_ERR, @@ -668,14 +669,16 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq, } }

-static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq,

struct request *req,

struct mmc_card *card,

struct mmc_blk_data *md)

+static int mmc_blk_err_check(struct mmc_card *card,

struct mmc_async_req *areq)

{ struct mmc_command cmd; u32 status = 0; enum mmc_blk_status ret = MMC_BLK_SUCCESS;

struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,

mmc_active);

struct mmc_blk_request *brq = &mq_mrq->brq;

struct request *req = mq_mrq->req;

/* * Check for errors here, but don't jump to cmd_err @@ -770,7 +773,11 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_blk_request *brq, else ret = MMC_BLK_DATA_ERR; } -out:

if (ret == MMC_BLK_SUCCESS &&

blk_rq_bytes(req) != brq->data.bytes_xfered)

ret = MMC_BLK_PARTIAL;

out:

return ret; }

@@ -901,27 +908,59 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, brq->data.sg_len = i; }

mqrq->mmc_active.mrq = &brq->mrq;

mqrq->mmc_active.err_check = mmc_blk_err_check;

mmc_queue_bounce_pre(mqrq); }

-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) +static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc) { struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card;

struct mmc_blk_request *brq = &mq->mqrq_cur->brq;

int ret = 1, disable_multi = 0;

struct mmc_blk_request *brq;

int ret = 1;

int disable_multi = 0;

enum mmc_blk_status status;

struct mmc_queue_req *mq_rq;

struct request *req;

struct mmc_async_req *areq;

if (!rqc && !mq->mqrq_prev->req)

goto out;

do {

mmc_blk_rw_rq_prep(mq->mqrq_cur, card, disable_multi, mq);

mmc_wait_for_req(card->host, &brq->mrq);

if (rqc) {

mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);

areq = &mq->mqrq_cur->mmc_active;

} else

areq = NULL;

areq = mmc_start_req(card->host, areq, (int *) &status);

if (!areq)

goto out;

mmc_queue_bounce_post(mq->mqrq_cur);

status = mmc_blk_err_check(brq, req, card, md);

mq_rq = container_of(areq, struct mmc_queue_req, mmc_active);

brq = &mq_rq->brq;

req = mq_rq->req;

mmc_queue_bounce_post(mq_rq);

switch (status) {

case MMC_BLK_CMD_ERR:

goto cmd_err;

case MMC_BLK_SUCCESS:

case MMC_BLK_PARTIAL:

/*

* A block was successfully transferred.

*/

spin_lock_irq(&md->lock);

ret = __blk_end_request(req, 0,

brq->data.bytes_xfered);

spin_unlock_irq(&md->lock);

if (status == MMC_BLK_SUCCESS && ret) {

/* If this happen it is a bug */

printk(KERN_ERR "%s BUG rq_tot %d d_xfer %d\n",

__func__, blk_rq_bytes(req),

brq->data.bytes_xfered);

goto cmd_err;

}

break; case MMC_BLK_RETRY: disable_multi = 1; @@ -934,36 +973,41 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) * read a single sector. */ spin_lock_irq(&md->lock);

ret = __blk_end_request(req, -EIO,

brq->data.blksz);

ret = __blk_end_request(req, -EIO, brq->data.blksz);

spin_unlock_irq(&md->lock);

if (!ret)

goto start_new_req;

break;

case MMC_BLK_SUCCESS:

case MMC_BLK_CMD_ERR:

ret = 1;

goto cmd_err;

break;

}

if (ret) {

/*

* A block was successfully transferred.

* In case of a none complete request

* prepare it again and resend.

*/

spin_lock_irq(&md->lock);

ret = __blk_end_request(req, 0, brq->data.bytes_xfered);

spin_unlock_irq(&md->lock);

break;

mmc_blk_rw_rq_prep(mq_rq, card, disable_multi, mq);

mmc_start_req(card->host, &mq_rq->mmc_active, NULL);

} } while (ret);

return 1;

out:

return 0;

cmd_err:

/*

* If this is an SD card and we're writing, we can first

* mark the known good sectors as ok.

*

/*

* If this is an SD card and we're writing, we can first

* mark the known good sectors as ok.

*

* If the card is not SD, we can still ok written sectors * as reported by the controller (which might be less than * the real number of written sectors, but never more). */ if (mmc_card_sd(card)) { u32 blocks;

blocks = mmc_sd_num_wr_blocks(card); if (blocks != (u32)-1) { spin_lock_irq(&md->lock); @@ -981,6 +1025,12 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req) ret = __blk_end_request(req, -EIO, blk_rq_cur_bytes(req)); spin_unlock_irq(&md->lock);

start_new_req:

if (rqc) {

mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);

mmc_start_req(card->host, &mq->mqrq_cur->mmc_active, NULL);

}

return 0; }

@@ -990,26 +1040,31 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) struct mmc_blk_data *md = mq->data; struct mmc_card *card = md->queue.card;

mmc_claim_host(card->host);

if (req && !mq->mqrq_prev->req)

/* claim host only for the first request */

mmc_claim_host(card->host);

ret = mmc_blk_part_switch(card, md); if (ret) { ret = 0; goto out; }

if (req->cmd_flags & REQ_DISCARD) {

if (req && req->cmd_flags & REQ_DISCARD) {

/* complete ongoing async transfer before issuing discard */ if (card->host->areq) mmc_blk_issue_rw_rq(mq, NULL);

prevent mmc_erase to run in parallel to mmc async request.

/Per

Russell King - ARM Linux

7:53 a.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On Sun, Jun 19, 2011 at 11:17:26PM +0200, Per Forlin wrote:

...

How significant is the cache maintenance over head?

Per,

Can you measure how much difference this has before and after your patch set please? This moves the dsb() out of the individual cache maintanence functions, such that we will only perform one dsb() per dma_*_sg call rather than one per SG entry.

Thanks.

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index 4fff837..853eba5 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -115,6 +115,11 @@ static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off, ___dma_page_dev_to_cpu(page, off, size, dir); }

+static inline void __dma_sync(void) +{ + dsb(); +} + /* * Return whether the given device DMA address mask can be supported * properly. For example, if your device can only drive the low 24-bits @@ -378,6 +383,7 @@ static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr, BUG_ON(!valid_dma_direction(dir));

addr = __dma_map_single(dev, cpu_addr, size, dir); + __dma_sync(); debug_dma_map_page(dev, virt_to_page(cpu_addr), (unsigned long)cpu_addr & ~PAGE_MASK, size, dir, addr, true); @@ -407,6 +413,7 @@ static inline dma_addr_t dma_map_page(struct device *dev, struct page *page, BUG_ON(!valid_dma_direction(dir));

addr = __dma_map_page(dev, page, offset, size, dir); + __dma_sync(); debug_dma_map_page(dev, page, offset, size, dir, addr, false);

return addr; @@ -431,6 +438,7 @@ static inline void dma_unmap_single(struct device *dev, dma_addr_t handle, { debug_dma_unmap_page(dev, handle, size, dir, true); __dma_unmap_single(dev, handle, size, dir); + __dma_sync(); }

/** @@ -452,6 +460,7 @@ static inline void dma_unmap_page(struct device *dev, dma_addr_t handle, { debug_dma_unmap_page(dev, handle, size, dir, false); __dma_unmap_page(dev, handle, size, dir); + __dma_sync(); }

/** @@ -484,6 +493,7 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev, return;

__dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir); + __dma_sync(); }

static inline void dma_sync_single_range_for_device(struct device *dev, @@ -498,6 +508,7 @@ static inline void dma_sync_single_range_for_device(struct device *dev, return;

__dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir); + __dma_sync(); }

static inline void dma_sync_single_for_cpu(struct device *dev, diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S index 1fa6f71..6eeb734 100644 --- a/arch/arm/mm/cache-fa.S +++ b/arch/arm/mm/cache-fa.S @@ -179,8 +179,6 @@ fa_dma_inv_range: add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* @@ -197,8 +195,6 @@ fa_dma_clean_range: add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* @@ -212,8 +208,6 @@ ENTRY(fa_dma_flush_range) add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S index f40c696..523c0cb 100644 --- a/arch/arm/mm/cache-v4wb.S +++ b/arch/arm/mm/cache-v4wb.S @@ -194,7 +194,6 @@ v4wb_dma_inv_range: add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* @@ -211,7 +210,6 @@ v4wb_dma_clean_range: add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index 73b4a8b..7a842dd 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -239,8 +239,6 @@ v6_dma_inv_range: strlo r2, [r0] @ write for ownership #endif blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* @@ -262,8 +260,6 @@ v6_dma_clean_range: add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* @@ -290,8 +286,6 @@ ENTRY(v6_dma_flush_range) strlob r2, [r0] @ write for ownership #endif blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index d32f02b..18dcef6 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -257,7 +257,6 @@ v7_dma_inv_range: add r0, r0, r2 cmp r0, r1 blo 1b - dsb mov pc, lr ENDPROC(v7_dma_inv_range)

@@ -275,7 +274,6 @@ v7_dma_clean_range: add r0, r0, r2 cmp r0, r1 blo 1b - dsb mov pc, lr ENDPROC(v7_dma_clean_range)

@@ -293,7 +291,6 @@ ENTRY(v7_dma_flush_range) add r0, r0, r2 cmp r0, r1 blo 1b - dsb mov pc, lr ENDPROC(v7_dma_flush_range)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 82a093c..042b056 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -97,6 +97,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf memset(ptr, 0, size); dmac_flush_range(ptr, ptr + size); outer_flush_range(__pa(ptr), __pa(ptr) + size); + __dma_sync();

return page; } @@ -572,6 +573,7 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, if (dma_mapping_error(dev, s->dma_address)) goto bad_mapping; } + __dma_sync(); debug_dma_map_sg(dev, sg, nents, nents, dir); return nents;

@@ -602,6 +604,8 @@ void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,

for_each_sg(sg, s, nents, i) __dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir); + + __dma_sync(); } EXPORT_SYMBOL(dma_unmap_sg);

@@ -627,6 +631,8 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, s->length, dir); }

+ __dma_sync(); + debug_dma_sync_sg_for_cpu(dev, sg, nents, dir); } EXPORT_SYMBOL(dma_sync_sg_for_cpu); @@ -653,6 +659,8 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, s->length, dir); }

+ __dma_sync(); + debug_dma_sync_sg_for_device(dev, sg, nents, dir); } EXPORT_SYMBOL(dma_sync_sg_for_device);

Per Forlin

8:09 a.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On 21 June 2011 09:53, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...

On Sun, Jun 19, 2011 at 11:17:26PM +0200, Per Forlin wrote:

...
How significant is the cache maintenance over head?

Per,

Can you measure how much difference this has before and after your patch set please?

Absolutely, I can run the mmc_tests to get the measurement. The cache affect is greater the faster the flash memory is. Currently I only have access to a SD card (20 MiB/S). By the end of this week I can run on eMMC (45 MiB/s) if this will be needed.

Thanks for your input, Per

Per Forlin

9:26 a.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On 21 June 2011 10:09, Per Forlin per.forlin@linaro.org wrote:

...

On 21 June 2011 09:53, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...
On Sun, Jun 19, 2011 at 11:17:26PM +0200, Per Forlin wrote:

...
How significant is the cache maintenance over head?

Per,

Can you measure how much difference this has before and after your patch set please?

Absolutely, I can run the mmc_tests to get the measurement. The cache affect is greater the faster the flash memory is. Currently I only have access to a SD card (20 MiB/S). By the end of this week I can run on eMMC (45 MiB/s) if this will be needed.

Russel,

Here are the results.

mmc_test results without your DSB patch: mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 37. Write performance with blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 17.907140069 seconds (7495 kB/s, 7319 KiB/s, 1829.88 IOPS) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 10.977203519 seconds (12226 kB/s, 11940 KiB/s, 1492.54 IOPS) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 8.618723194 seconds (15572 kB/s, 15207 KiB/s, 950.48 IOPS) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.452392708 seconds (18010 kB/s, 17587 KiB/s, 549.62 IOPS) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.839447152 seconds (19624 kB/s, 19164 KiB/s, 299.43 IOPS) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.533447450 seconds (20543 kB/s, 20061 KiB/s, 156.73 IOPS) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.355529943 seconds (21118 kB/s, 20623 KiB/s, 80.55 IOPS) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.227417019 seconds (21552 kB/s, 21047 KiB/s, 41.10 IOPS) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 6.047821091 seconds (22192 kB/s, 21672 KiB/s, 21.16 IOPS) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.983120236 seconds (22432 kB/s, 21906 KiB/s, 5.34 IOPS) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 38. Write performance with non-blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 17.004930158 seconds (7892 kB/s, 7707 KiB/s, 1926.97 IOPS) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 10.397338972 seconds (12908 kB/s, 12606 KiB/s, 1575.78 IOPS) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 8.127319360 seconds (16514 kB/s, 16127 KiB/s, 1007.95 IOPS) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.061096329 seconds (19008 kB/s, 18562 KiB/s, 580.07 IOPS) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.503535845 seconds (20637 kB/s, 20153 KiB/s, 314.90 IOPS) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.222897631 seconds (21568 kB/s, 21062 KiB/s, 164.55 IOPS) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.082733285 seconds (22065 kB/s, 21548 KiB/s, 84.17 IOPS) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.928009056 seconds (22641 kB/s, 22110 KiB/s, 43.18 IOPS) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 5.891113751 seconds (22783 kB/s, 22249 KiB/s, 21.72 IOPS) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.878531233 seconds (22831 kB/s, 22296 KiB/s, 5.44 IOPS) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 39. Read performance with blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 20.904750140 seconds (6420 kB/s, 6269 KiB/s, 1567.49 IOPS) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 12.929870605 seconds (10380 kB/s, 10137 KiB/s, 1267.14 IOPS) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 10.115753174 seconds (13268 kB/s, 12957 KiB/s, 809.82 IOPS) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.533538819 seconds (17816 kB/s, 17398 KiB/s, 543.70 IOPS) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.937011718 seconds (19348 kB/s, 18894 KiB/s, 295.22 IOPS) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.638824464 seconds (20217 kB/s, 19743 KiB/s, 154.24 IOPS) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.489288330 seconds (20682 kB/s, 20198 KiB/s, 78.89 IOPS) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.414489746 seconds (20924 kB/s, 20433 KiB/s, 39.90 IOPS) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 6.376800426 seconds (21047 kB/s, 20554 KiB/s, 20.07 IOPS) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 6.348991821 seconds (21140 kB/s, 20644 KiB/s, 5.04 IOPS) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 40. Read performance with non-blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 20.906376527 seconds (6419 kB/s, 6269 KiB/s, 1567.36 IOPS) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 12.929779053 seconds (10380 kB/s, 10137 KiB/s, 1267.15 IOPS) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 10.119873047 seconds (13262 kB/s, 12951 KiB/s, 809.49 IOPS) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.501770019 seconds (17891 kB/s, 17472 KiB/s, 546.00 IOPS) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.797882080 seconds (19744 kB/s, 19281 KiB/s, 301.27 IOPS) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.293121338 seconds (21327 kB/s, 20827 KiB/s, 162.71 IOPS) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 5.952606200 seconds (22547 kB/s, 22019 KiB/s, 86.01 IOPS) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.862152101 seconds (22895 kB/s, 22359 KiB/s, 43.66 IOPS) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 5.818847175 seconds (23066 kB/s, 22525 KiB/s, 21.99 IOPS) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.798218390 seconds (23148 kB/s, 22605 KiB/s, 5.51 IOPS) mmc0: Result: OK mmc0: Tests completed.

mmc_test results with your DSB patch: mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 37. Write performance with blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 17.912285550 seconds (7493 kB/s, 7317 KiB/s, 1829.35 IOPS) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 10.992614823 seconds (12209 kB/s, 11923 KiB/s, 1490.45 IOPS) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 8.670936194 seconds (15479 kB/s, 15116 KiB/s, 944.76 IOPS) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.448752639 seconds (18018 kB/s, 17596 KiB/s, 549.89 IOPS) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.837432905 seconds (19629 kB/s, 19169 KiB/s, 299.52 IOPS) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.510650765 seconds (20615 kB/s, 20131 KiB/s, 157.28 IOPS) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.343047841 seconds (21159 kB/s, 20663 KiB/s, 80.71 IOPS) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.275632327 seconds (21387 kB/s, 20885 KiB/s, 40.79 IOPS) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 6.051895663 seconds (22177 kB/s, 21658 KiB/s, 21.15 IOPS) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.992395203 seconds (22398 kB/s, 21873 KiB/s, 5.34 IOPS) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 38. Write performance with non-blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 17.019586188 seconds (7886 kB/s, 7701 KiB/s, 1925.31 IOPS) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 10.377655096 seconds (12933 kB/s, 12630 KiB/s, 1578.77 IOPS) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 8.172790531 seconds (16422 kB/s, 16037 KiB/s, 1002.35 IOPS) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.069458097 seconds (18985 kB/s, 18540 KiB/s, 579.39 IOPS) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.498779387 seconds (20652 kB/s, 20168 KiB/s, 315.13 IOPS) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.220800166 seconds (21575 kB/s, 21069 KiB/s, 164.60 IOPS) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.040708413 seconds (22218 kB/s, 21698 KiB/s, 84.75 IOPS) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.946899457 seconds (22569 kB/s, 22040 KiB/s, 43.04 IOPS) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 5.927886710 seconds (22641 kB/s, 22111 KiB/s, 21.59 IOPS) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.878386087 seconds (22832 kB/s, 22297 KiB/s, 5.44 IOPS) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 39. Read performance with blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 20.829314216 seconds (6443 kB/s, 6292 KiB/s, 1573.16 IOPS) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 12.875244140 seconds (10424 kB/s, 10180 KiB/s, 1272.51 IOPS) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 10.073059082 seconds (13324 kB/s, 13012 KiB/s, 813.25 IOPS) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.550659181 seconds (17775 kB/s, 17359 KiB/s, 542.46 IOPS) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.942535401 seconds (19332 kB/s, 18879 KiB/s, 294.99 IOPS) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.645233154 seconds (20197 kB/s, 19724 KiB/s, 154.09 IOPS) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.495941164 seconds (20661 kB/s, 20177 KiB/s, 78.81 IOPS) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.421081542 seconds (20902 kB/s, 20412 KiB/s, 39.86 IOPS) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 6.383514604 seconds (21025 kB/s, 20532 KiB/s, 20.05 IOPS) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 6.355718936 seconds (21117 kB/s, 20622 KiB/s, 5.03 IOPS) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 40. Read performance with non-blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 20.832669187 seconds (6442 kB/s, 6291 KiB/s, 1572.91 IOPS) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 12.884582520 seconds (10416 kB/s, 10172 KiB/s, 1271.59 IOPS) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 10.076812745 seconds (13319 kB/s, 13007 KiB/s, 812.95 IOPS) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.471252441 seconds (17964 kB/s, 17543 KiB/s, 548.23 IOPS) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.765075684 seconds (19839 kB/s, 19374 KiB/s, 302.73 IOPS) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.259826661 seconds (21441 kB/s, 20938 KiB/s, 163.58 IOPS) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 5.948974608 seconds (22561 kB/s, 22032 KiB/s, 86.06 IOPS) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.860260010 seconds (22903 kB/s, 22366 KiB/s, 43.68 IOPS) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 5.817993397 seconds (23069 kB/s, 22528 KiB/s, 22.00 IOPS) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.798185906 seconds (23148 kB/s, 22605 KiB/s, 5.51 IOPS) mmc0: Result: OK mmc0: Tests completed.

In case I did any mistakes applying your patch manually. Here is your dsb patch on top of 3.0-rc4. diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index 4fff837..ad14c2b 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -115,6 +115,11 @@ static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off, ___dma_page_dev_to_cpu(page, off, size, dir); }

addr = __dma_map_page(dev, page, offset, size, dir); + __dma_sync(); debug_dma_map_page(dev, page, offset, size, dir, addr, false);

/** @@ -498,6 +507,7 @@ static inline void dma_sync_single_range_for_device(struct device *dev, return;

__dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir); + __dma_sync(); }

/* @@ -197,8 +195,6 @@ fa_dma_clean_range: add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* @@ -212,8 +208,6 @@ ENTRY(fa_dma_flush_range) add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* @@ -211,7 +210,6 @@ v4wb_dma_clean_range: add r0, r0, #CACHE_DLINESIZE cmp r0, r1 blo 1b - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* @@ -262,8 +260,6 @@ v6_dma_clean_range: add r0, r0, #D_CACHE_LINE_SIZE cmp r0, r1 blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

/* @@ -290,8 +286,6 @@ ENTRY(v6_dma_flush_range) strlob r2, [r0] @ write for ownership #endif blo 1b - mov r0, #0 - mcr p15, 0, r0, c7, c10, 4 @ drain write buffer mov pc, lr

@@ -293,7 +291,6 @@ ENTRY(v7_dma_flush_range) add r0, r0, r2 cmp r0, r1 blo 1b - dsb mov pc, lr ENDPROC(v7_dma_flush_range)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 82a093c..ff85283 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -97,6 +97,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf memset(ptr, 0, size); dmac_flush_range(ptr, ptr + size); outer_flush_range(__pa(ptr), __pa(ptr) + size); + __dma_sync();

@@ -602,6 +604,7 @@ void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,

for_each_sg(sg, s, nents, i) __dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir); + __dma_sync(); } EXPORT_SYMBOL(dma_unmap_sg);

@@ -626,6 +629,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, __dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir); } + __dma_sync();

debug_dma_sync_sg_for_cpu(dev, sg, nents, dir); } @@ -652,6 +656,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); } + __dma_sync();

debug_dma_sync_sg_for_device(dev, sg, nents, dir); }

-- 1.7.4.1

Russell King - ARM Linux

23 Jun 23 Jun

1:37 p.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On Tue, Jun 21, 2011 at 11:26:27AM +0200, Per Forlin wrote:

...

Here are the results.

It looks like this patch is either a no-op or slightly worse. As people have been telling me that dsb is rather expensive, and this patch results in less dsbs, I'm finding these results hard to believe. It seems to be saying that dsb is an effective no-op on your platform.

So either people are wrong about dsb being expensive, the patch is wrong, or there's something wrong with these results/test method.

You do have an error in the ported patch, as that hasn't updated the v7 cache cleaning code to remove the dsb() there, but that would only affect the write tests.

Per Forlin

24 Jun 24 Jun

8:58 a.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On 23 June 2011 15:37, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...

On Tue, Jun 21, 2011 at 11:26:27AM +0200, Per Forlin wrote:

...
Here are the results.

It looks like this patch is either a no-op or slightly worse. As people have been telling me that dsb is rather expensive, and this patch results in less dsbs, I'm finding these results hard to believe. It seems to be saying that dsb is an effective no-op on your platform.

The result of your patch depends on the number of sg-elements. With your patch there is only on DSB per list instead of element I can write a test to measure performance per number of sg-element in the sg-list. Fixed transfer size but vary the number of sg-elements in the list. This test may give a better understanding of the affect.

I have seen performance gain if using __raw_write instead of writel. Writel test includes both the cost of DSB and the outer_sync, where outer_sync is more expensive one I presume.

...

So either people are wrong about dsb being expensive, the patch is wrong, or there's something wrong with these results/test method.

You do have an error in the ported patch, as that hasn't updated the v7 cache cleaning code to remove the dsb() there, but that would only affect the write tests.

I will fix that mistake and also improve the test cases to measure the cost per number of sg-elements.

I'll come back with new numbers on Monday.

Regards, Per

Per Forlin

27 Jun 27 Jun

9:42 a.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On 24 June 2011 10:58, Per Forlin per.forlin@linaro.org wrote:

...

On 23 June 2011 15:37, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...
On Tue, Jun 21, 2011 at 11:26:27AM +0200, Per Forlin wrote:

...
Here are the results.

It looks like this patch is either a no-op or slightly worse. As people have been telling me that dsb is rather expensive, and this patch results in less dsbs, I'm finding these results hard to believe. It seems to be saying that dsb is an effective no-op on your platform.

The result of your patch depends on the number of sg-elements. With your patch there is only on DSB per list instead of element I can write a test to measure performance per number of sg-element in the sg-list. Fixed transfer size but vary the number of sg-elements in the list. This test may give a better understanding of the affect.

I have seen performance gain if using __raw_write instead of writel. Writel test includes both the cost of DSB and the outer_sync, where outer_sync is more expensive one I presume.

...
So either people are wrong about dsb being expensive, the patch is wrong, or there's something wrong with these results/test method.

You do have an error in the ported patch, as that hasn't updated the v7 cache cleaning code to remove the dsb() there, but that would only affect the write tests.

I will fix that mistake

diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index d32f02b..3fb51c5 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -228,7 +228,6 @@ ENTRY(v7_flush_kern_dcache_area) add r0, r0, r2 cmp r0, r1 blo 1b - dsb mov pc, lr ENDPROC(v7_flush_kern_dcache_area)

...

I'll come back with new numbers on Monday.

I have extended the test to measure bandwidth for various various length of the sg list.

mmc_test without DSB patch: mmc0: Test case 37. Write performance with blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 18.298817895 seconds (7334 kB/s, 7162 KiB/s, 1790.71 IOPS, sg_len 1) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 11.046417371 seconds (12150 kB/s, 11865 KiB/s, 1483.19 IOPS, sg_len 1) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 8.700345332 seconds (15426 kB/s, 15065 KiB/s, 941.57 IOPS, sg_len 1) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.428314416 seconds (18068 kB/s, 17644 KiB/s, 551.40 IOPS, sg_len 1) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.843811190 seconds (19611 kB/s, 19151 KiB/s, 299.24 IOPS, sg_len 1) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.548462043 seconds (20496 kB/s, 20015 KiB/s, 156.37 IOPS, sg_len 1) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.392456168 seconds (20996 kB/s, 20504 KiB/s, 80.09 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.278533955 seconds (21377 kB/s, 20876 KiB/s, 40.77 IOPS, sg_len 1) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 6.007019613 seconds (22343 kB/s, 21819 KiB/s, 21.30 IOPS, sg_len 1) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.975690092 seconds (22460 kB/s, 21934 KiB/s, 5.35 IOPS, sg_len 1) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 38. Write performance with non-blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 18.006849673 seconds (7453 kB/s, 7279 KiB/s, 1819.75 IOPS, sg_len 1) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 10.744232260 seconds (12492 kB/s, 12199 KiB/s, 1524.91 IOPS, sg_len 1) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 8.378324787 seconds (16019 kB/s, 15644 KiB/s, 977.76 IOPS, sg_len 1) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.120544379 seconds (18849 kB/s, 18407 KiB/s, 575.23 IOPS, sg_len 1) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.551513699 seconds (20486 kB/s, 20006 KiB/s, 312.59 IOPS, sg_len 1) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.252501827 seconds (21466 kB/s, 20963 KiB/s, 163.77 IOPS, sg_len 1) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.102325404 seconds (21994 kB/s, 21479 KiB/s, 83.90 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.978148815 seconds (22451 kB/s, 21925 KiB/s, 42.82 IOPS, sg_len 1) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 5.873932398 seconds (22849 kB/s, 22314 KiB/s, 21.79 IOPS, sg_len 1) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.874753979 seconds (22846 kB/s, 22311 KiB/s, 5.44 IOPS, sg_len 1) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 39. Read performance with blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 20.897765402 seconds (6422 kB/s, 6272 KiB/s, 1568.01 IOPS, sg_len 1) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 12.921478271 seconds (10387 kB/s, 10143 KiB/s, 1267.96 IOPS, sg_len 1) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 10.111419678 seconds (13273 kB/s, 12962 KiB/s, 810.17 IOPS, sg_len 1) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.551544189 seconds (17773 kB/s, 17356 KiB/s, 542.40 IOPS, sg_len 1) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.958251954 seconds (19289 kB/s, 18836 KiB/s, 294.32 IOPS, sg_len 1) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.656890870 seconds (20162 kB/s, 19689 KiB/s, 153.82 IOPS, sg_len 1) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.504821778 seconds (20633 kB/s, 20149 KiB/s, 78.71 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.428955079 seconds (20877 kB/s, 20387 KiB/s, 39.81 IOPS, sg_len 1) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 6.391205311 seconds (21000 kB/s, 20508 KiB/s, 20.02 IOPS, sg_len 1) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 6.362468401 seconds (21095 kB/s, 20600 KiB/s, 5.02 IOPS, sg_len 1) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 40. Read performance with non-blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 20.879326369 seconds (6428 kB/s, 6277 KiB/s, 1569.39 IOPS, sg_len 1) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 12.924346924 seconds (10384 kB/s, 10141 KiB/s, 1267.68 IOPS, sg_len 1) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 10.111450196 seconds (13273 kB/s, 12962 KiB/s, 810.17 IOPS, sg_len 1) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.498107909 seconds (17900 kB/s, 17480 KiB/s, 546.27 IOPS, sg_len 1) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.791412354 seconds (19762 kB/s, 19299 KiB/s, 301.55 IOPS, sg_len 1) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.284973145 seconds (21355 kB/s, 20854 KiB/s, 162.92 IOPS, sg_len 1) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 5.951568601 seconds (22551 kB/s, 22023 KiB/s, 86.02 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861846924 seconds (22896 kB/s, 22360 KiB/s, 43.67 IOPS, sg_len 1) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 5.818786662 seconds (23066 kB/s, 22525 KiB/s, 21.99 IOPS, sg_len 1) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.798608182 seconds (23146 kB/s, 22604 KiB/s, 5.51 IOPS, sg_len 1) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 41. Write performance blocking req 1 to 512 sg elems... mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.272007461 seconds (21399 kB/s, 20897 KiB/s, 40.81 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.282043489 seconds (21365 kB/s, 20864 KiB/s, 40.75 IOPS, sg_len 8) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.272643023 seconds (21397 kB/s, 20895 KiB/s, 40.81 IOPS, sg_len 16) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.229250295 seconds (21546 kB/s, 21041 KiB/s, 41.09 IOPS, sg_len 32) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.273985326 seconds (21392 kB/s, 20891 KiB/s, 40.80 IOPS, sg_len 64) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.290618193 seconds (21336 kB/s, 20836 KiB/s, 40.69 IOPS, sg_len 128) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.319313199 seconds (21239 kB/s, 20741 KiB/s, 40.51 IOPS, sg_len 256) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.395201864 seconds (20987 kB/s, 20495 KiB/s, 40.03 IOPS, sg_len 512) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 42. Write performance non-blocking req 1 to 512 sg elems... mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.932434164 seconds (22624 kB/s, 22094 KiB/s, 43.15 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.977142417 seconds (22455 kB/s, 21928 KiB/s, 42.82 IOPS, sg_len 8) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.992553748 seconds (22397 kB/s, 21872 KiB/s, 42.71 IOPS, sg_len 16) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.977783455 seconds (22452 kB/s, 21926 KiB/s, 42.82 IOPS, sg_len 32) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.974916543 seconds (22463 kB/s, 21937 KiB/s, 42.84 IOPS, sg_len 64) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.933075093 seconds (22621 kB/s, 22091 KiB/s, 43.14 IOPS, sg_len 128) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.988067716 seconds (22414 kB/s, 21888 KiB/s, 42.75 IOPS, sg_len 256) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.982299966 seconds (22435 kB/s, 21909 KiB/s, 42.79 IOPS, sg_len 512) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 43. Read performance blocking req 1 to 512 sg elems... mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.427185060 seconds (20882 kB/s, 20393 KiB/s, 39.83 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.428710938 seconds (20877 kB/s, 20388 KiB/s, 39.82 IOPS, sg_len 8) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.430084229 seconds (20873 kB/s, 20384 KiB/s, 39.81 IOPS, sg_len 16) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.432220459 seconds (20866 kB/s, 20377 KiB/s, 39.79 IOPS, sg_len 32) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.435882569 seconds (20854 kB/s, 20365 KiB/s, 39.77 IOPS, sg_len 64) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.441589356 seconds (20836 kB/s, 20347 KiB/s, 39.74 IOPS, sg_len 128) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.507446289 seconds (20625 kB/s, 20141 KiB/s, 39.33 IOPS, sg_len 256) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.645568847 seconds (20196 kB/s, 19723 KiB/s, 38.52 IOPS, sg_len 512) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 44. Read performance non-blocking req 1 to 512 sg elems... mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861175537 seconds (22899 kB/s, 22362 KiB/s, 43.67 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861267090 seconds (22899 kB/s, 22362 KiB/s, 43.67 IOPS, sg_len 8) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861328125 seconds (22898 kB/s, 22362 KiB/s, 43.67 IOPS, sg_len 16) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861419678 seconds (22898 kB/s, 22361 KiB/s, 43.67 IOPS, sg_len 32) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861480713 seconds (22898 kB/s, 22361 KiB/s, 43.67 IOPS, sg_len 64) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861602783 seconds (22897 kB/s, 22361 KiB/s, 43.67 IOPS, sg_len 128) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861999512 seconds (22896 kB/s, 22359 KiB/s, 43.67 IOPS, sg_len 256) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.862915039 seconds (22892 kB/s, 22356 KiB/s, 43.66 IOPS, sg_len 512) mmc0: Result: OK mmc0: Tests completed.

mmc_test with DSB patch mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 37. Write performance with blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 18.068062451 seconds (7428 kB/s, 7254 KiB/s, 1813.58 IOPS, sg_len 1) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 11.099609390 seconds (12092 kB/s, 11808 KiB/s, 1476.08 IOPS, sg_len 1) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 8.677063074 seconds (15468 kB/s, 15105 KiB/s, 944.09 IOPS, sg_len 1) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.476867759 seconds (17951 kB/s, 17530 KiB/s, 547.82 IOPS, sg_len 1) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.819549471 seconds (19681 kB/s, 19220 KiB/s, 300.31 IOPS, sg_len 1) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.524749957 seconds (20570 kB/s, 20088 KiB/s, 156.94 IOPS, sg_len 1) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.395263629 seconds (20987 kB/s, 20495 KiB/s, 80.05 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.271362333 seconds (21401 kB/s, 20900 KiB/s, 40.82 IOPS, sg_len 1) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 6.057769872 seconds (22156 kB/s, 21637 KiB/s, 21.12 IOPS, sg_len 1) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.953065733 seconds (22545 kB/s, 22017 KiB/s, 5.37 IOPS, sg_len 1) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 38. Write performance with non-blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 17.807667705 seconds (7537 kB/s, 7360 KiB/s, 1840.10 IOPS, sg_len 1) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 10.798034119 seconds (12429 kB/s, 12138 KiB/s, 1517.31 IOPS, sg_len 1) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 8.365875302 seconds (16043 kB/s, 15667 KiB/s, 979.21 IOPS, sg_len 1) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.169311773 seconds (18721 kB/s, 18282 KiB/s, 571.32 IOPS, sg_len 1) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.518709807 seconds (20589 kB/s, 20107 KiB/s, 314.17 IOPS, sg_len 1) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.232238768 seconds (21536 kB/s, 21031 KiB/s, 164.30 IOPS, sg_len 1) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.100677623 seconds (22000 kB/s, 21484 KiB/s, 83.92 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.984959516 seconds (22425 kB/s, 21900 KiB/s, 42.77 IOPS, sg_len 1) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 5.920165778 seconds (22671 kB/s, 22139 KiB/s, 21.62 IOPS, sg_len 1) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.844845626 seconds (22963 kB/s, 22425 KiB/s, 5.47 IOPS, sg_len 1) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 39. Read performance with blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 20.818960380 seconds (6446 kB/s, 6295 KiB/s, 1573.94 IOPS, sg_len 1) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 12.869567871 seconds (10429 kB/s, 10184 KiB/s, 1273.08 IOPS, sg_len 1) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 10.071319579 seconds (13326 kB/s, 13014 KiB/s, 813.39 IOPS, sg_len 1) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.574279785 seconds (17720 kB/s, 17304 KiB/s, 540.77 IOPS, sg_len 1) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.955871583 seconds (19295 kB/s, 18843 KiB/s, 294.42 IOPS, sg_len 1) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.655639650 seconds (20166 kB/s, 19693 KiB/s, 153.85 IOPS, sg_len 1) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 6.504333497 seconds (20635 kB/s, 20151 KiB/s, 78.71 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.428558349 seconds (20878 kB/s, 20389 KiB/s, 39.82 IOPS, sg_len 1) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 6.390869097 seconds (21001 kB/s, 20509 KiB/s, 20.02 IOPS, sg_len 1) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 6.362161563 seconds (21096 kB/s, 20601 KiB/s, 5.02 IOPS, sg_len 1) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 40. Read performance with non-blocking req 4k to 4MB... mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB) took 20.820581694 seconds (6446 kB/s, 6295 KiB/s, 1573.82 IOPS, sg_len 1) mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB) took 12.883728027 seconds (10417 kB/s, 10173 KiB/s, 1271.68 IOPS, sg_len 1) mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB) took 10.078765870 seconds (13316 kB/s, 13004 KiB/s, 812.79 IOPS, sg_len 1) mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB) took 7.474670410 seconds (17956 kB/s, 17535 KiB/s, 547.98 IOPS, sg_len 1) mmc0: Transfer of 2048 x 128 sectors (2048 x 64 KiB) took 6.766143799 seconds (19836 kB/s, 19371 KiB/s, 302.68 IOPS, sg_len 1) mmc0: Transfer of 1024 x 256 sectors (1024 x 128 KiB) took 6.263549804 seconds (21428 kB/s, 20926 KiB/s, 163.48 IOPS, sg_len 1) mmc0: Transfer of 512 x 512 sectors (512 x 256 KiB) took 5.948516846 seconds (22563 kB/s, 22034 KiB/s, 86.07 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.860290527 seconds (22902 kB/s, 22366 KiB/s, 43.68 IOPS, sg_len 1) mmc0: Transfer of 128 x 2048 sectors (128 x 1024 KiB) took 5.817961291 seconds (23069 kB/s, 22528 KiB/s, 22.00 IOPS, sg_len 1) mmc0: Transfer of 32 x 8192 sectors (32 x 4096 KiB) took 5.798411425 seconds (23147 kB/s, 22604 KiB/s, 5.51 IOPS, sg_len 1) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 41. Write performance blocking req 1 to 512 sg elems... mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.255558930 seconds (21455 kB/s, 20952 KiB/s, 40.92 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.286008096 seconds (21351 kB/s, 20851 KiB/s, 40.72 IOPS, sg_len 8) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.273094892 seconds (21395 kB/s, 20894 KiB/s, 40.80 IOPS, sg_len 16) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.274107614 seconds (21392 kB/s, 20890 KiB/s, 40.80 IOPS, sg_len 32) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.239166371 seconds (21512 kB/s, 21007 KiB/s, 41.03 IOPS, sg_len 64) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.263824503 seconds (21427 kB/s, 20925 KiB/s, 40.86 IOPS, sg_len 128) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.314788699 seconds (21254 kB/s, 20756 KiB/s, 40.53 IOPS, sg_len 256) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.385681206 seconds (21018 kB/s, 20525 KiB/s, 40.08 IOPS, sg_len 512) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 42. Write performance non-blocking req 1 to 512 sg elems... mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.977571877 seconds (22453 kB/s, 21927 KiB/s, 42.82 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.939239529 seconds (22598 kB/s, 22068 KiB/s, 43.10 IOPS, sg_len 8) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.976898254 seconds (22456 kB/s, 21929 KiB/s, 42.83 IOPS, sg_len 16) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.978546066 seconds (22449 kB/s, 21923 KiB/s, 42.81 IOPS, sg_len 32) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.978147794 seconds (22451 kB/s, 21925 KiB/s, 42.82 IOPS, sg_len 64) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.979031678 seconds (22448 kB/s, 21921 KiB/s, 42.81 IOPS, sg_len 128) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.939541986 seconds (22597 kB/s, 22067 KiB/s, 43.10 IOPS, sg_len 256) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.970123311 seconds (22481 kB/s, 21954 KiB/s, 42.88 IOPS, sg_len 512) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 43. Read performance blocking req 1 to 512 sg elems... mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.428588865 seconds (20878 kB/s, 20388 KiB/s, 39.82 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.428924562 seconds (20877 kB/s, 20387 KiB/s, 39.82 IOPS, sg_len 8) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.429901123 seconds (20873 kB/s, 20384 KiB/s, 39.81 IOPS, sg_len 16) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.431579590 seconds (20868 kB/s, 20379 KiB/s, 39.80 IOPS, sg_len 32) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.435455322 seconds (20855 kB/s, 20367 KiB/s, 39.77 IOPS, sg_len 64) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.441680908 seconds (20835 kB/s, 20347 KiB/s, 39.74 IOPS, sg_len 128) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.505981447 seconds (20629 kB/s, 20146 KiB/s, 39.34 IOPS, sg_len 256) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 6.638854980 seconds (20216 kB/s, 19743 KiB/s, 38.56 IOPS, sg_len 512) mmc0: Result: OK mmc0: Tests completed. mmc0: Starting tests of card mmc0:80ca... mmc0: Test case 44. Read performance non-blocking req 1 to 512 sg elems... mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861053467 seconds (22899 kB/s, 22363 KiB/s, 43.67 IOPS, sg_len 1) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861083984 seconds (22899 kB/s, 22363 KiB/s, 43.67 IOPS, sg_len 8) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861022950 seconds (22900 kB/s, 22363 KiB/s, 43.67 IOPS, sg_len 16) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861022949 seconds (22900 kB/s, 22363 KiB/s, 43.67 IOPS, sg_len 32) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861236572 seconds (22899 kB/s, 22362 KiB/s, 43.67 IOPS, sg_len 64) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861358642 seconds (22898 kB/s, 22362 KiB/s, 43.67 IOPS, sg_len 128) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.861694337 seconds (22897 kB/s, 22360 KiB/s, 43.67 IOPS, sg_len 256) mmc0: Transfer of 256 x 1024 sectors (256 x 512 KiB) took 5.862884521 seconds (22892 kB/s, 22356 KiB/s, 43.66 IOPS, sg_len 512) mmc0: Result: OK mmc0: Tests completed.

Conclusion: Working with mmc the relative cost of DSB is almost none. There seems to be slightly higher number for mmc blocking requests with the DSB patch compared to not having it.

Regards, Per

Russell King - ARM Linux

10:02 a.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On Mon, Jun 27, 2011 at 11:42:52AM +0200, Per Forlin wrote:

...

Conclusion: Working with mmc the relative cost of DSB is almost none. There seems to be slightly higher number for mmc blocking requests with the DSB patch compared to not having it.

These figures suggest that dsb is comparitively not heavy on the hardware you're testing.

I think I'm going to apply the patch anyway - it certainly makes stuff no worse, and if someone has a platform where dsb is more expensive, then they should see a greater benefit from this change.

The next thing to think about in DMA-land is whether we should total up the size of the SG list and choose whether to flush the individual SG elements or do a full cache flush. There becomes a point where the full cache flush becomes cheaper than flushing each SG element individually.

Per Forlin

10:21 a.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On 27 June 2011 12:02, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...

On Mon, Jun 27, 2011 at 11:42:52AM +0200, Per Forlin wrote:

...
Conclusion: Working with mmc the relative cost of DSB is almost none. There seems to be slightly higher number for mmc blocking requests with the DSB patch compared to not having it.

These figures suggest that dsb is comparitively not heavy on the hardware you're testing.

Yes, of course.

...

I think I'm going to apply the patch anyway - it certainly makes stuff no worse, and if someone has a platform where dsb is more expensive, then they should see a greater benefit from this change.

I agree.

...

The next thing to think about in DMA-land is whether we should total up the size of the SG list and choose whether to flush the individual SG elements or do a full cache flush. There becomes a point where the full cache flush becomes cheaper than flushing each SG element individually.

Interesting. I have seen such optimisations in hwmem (yet another memory manager for multi media hardware). It would be nice to have such functionality in dma-mapping, to be used by anyone

Regards, Per

Linus Walleij

3:29 p.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On Mon, Jun 27, 2011 at 12:02 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...

The next thing to think about in DMA-land is whether we should total up the size of the SG list and choose whether to flush the individual SG elements or do a full cache flush. There becomes a point where the full cache flush becomes cheaper than flushing each SG element individually.

We noticed that even for a single (large) buffer, any cache flush operation above a certain threshold flushing indiviudal lines become more expensive than flushing the entire cache.

I requested colleagues to look into implenting this threshold in the arch/arm/mm/cache-v7.S file. but I think they ran into trouble and eventually had to give up on it.

Vijay or Srinidhi, can you share your findings?

Thanks, Linus Walleij

Vijaya Kumar K-1

4:34 p.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

Hi,

The below are the timings on clean & flush.

/* Size Clean Dirty_clean Flush Dirty_Flush T1(ns) T2(ns) T3(ns) T2(ns) ============================================================ 4096 30517 30517 30517 30517 8192 30517 30517 30517 30517 16384 30518 30518 30518 30518 32768 30518 30518 30518 61035<-- 36864 61036 61036 61035 61035 65536 91553 91553 91553 91553 131072 183106 183106 183106 183106

Full 30518 30518 30518 30518<-- Cache

*/ /* Based on Above values, 32768 size is breakeven for flushing/cleaning * full D cache */

I have noticed with 32KB DLIMIT, there is small reduction about 1fps in skiamark profile after this change. It could be because of full flush or clean is causing more cache misses later on in the execution.

However with 64KB DLIMIT, there is further degrade in skiamark performance. So I think 32KB is good value.

However the problems are seen in the Android UI. Small artifacts are seen during Video playback on UI widgets.

This artifacts are not seen if clean is called for each cpu.

Also I find it takes some effort to implement clean_all / flush_all API's in cache-V7.S (asm) file to execute on each cpu. And hence it was parked aside.

And I have not investigated, why flush on both cases in case of flush all on Both cpu's always works?

Thanks & Regards Vijay

-----Original Message----- From: Linus Walleij [mailto:linus.walleij@linaro.org] Sent: Monday, June 27, 2011 5:30 PM To: Russell King - ARM Linux; Srinidhi KASAGAR; Vijaya Kumar K-1 Cc: Per Forlin; Nicolas Pitre; Chris Ball; linaro-dev@lists.linaro.org; linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Robert Fekete Subject: Re: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On Mon, Jun 27, 2011 at 12:02 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...

The next thing to think about in DMA-land is whether we should total up the size of the SG list and choose whether to flush the individual SG elements or do a full cache flush. There becomes a point where the full cache flush becomes cheaper than flushing each SG element individually.

We noticed that even for a single (large) buffer, any cache flush operation above a certain threshold flushing indiviudal lines become more expensive than flushing the entire cache.

I requested colleagues to look into implenting this threshold in the arch/arm/mm/cache-v7.S file. but I think they ran into trouble and eventually had to give up on it.

Vijay or Srinidhi, can you share your findings?

Thanks, Linus Walleij

saeed bishara

10:34 a.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

...

+static inline void __dma_sync(void) +{

dsb();

+}

/* * Return whether the given device DMA address mask can be supported * properly. For example, if your device can only drive the low 24-bits @@ -378,6 +383,7 @@ static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr, BUG_ON(!valid_dma_direction(dir));

addr = __dma_map_single(dev, cpu_addr, size, dir);

__dma_sync();

Russell, I'm curious about the correctness of this patch for systems with outer cache. shouldn't the dsb be issued before the outer cache maintenance? saeed

Russell King - ARM Linux

11:02 a.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On Mon, Jun 27, 2011 at 01:34:48PM +0300, saeed bishara wrote:

...

Russell, I'm curious about the correctness of this patch for systems with outer cache. shouldn't the dsb be issued before the outer cache maintenance?

Maybe we should do two passes over SG lists then - one for the inner and another for the outer cache?

In effect we could do three passes:

1. Calculate the total size of the SG list to determine whether full cache flush is more efficient. 2. Flush inner cache Then dsb() 3. Flush outer cache Another dsb()

saeed bishara

28 Jun 28 Jun

6:22 a.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On Mon, Jun 27, 2011 at 2:02 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...

On Mon, Jun 27, 2011 at 01:34:48PM +0300, saeed bishara wrote:

...
Russell, I'm curious about the correctness of this patch for systems with outer cache. shouldn't the dsb be issued before the outer cache maintenance?

Maybe we should do two passes over SG lists then - one for the inner and another for the outer cache?

In effect we could do three passes:

Calculate the total size of the SG list to determine whether full

cache flush is more efficient. 2. Flush inner cache Then dsb() 3. Flush outer cache Another dsb()

looking at l2x0 cache, it seems to me that it would be possible to do asynchronous l2 cache maintenance for range operations, so maybe that can be utilized in order to parallelism between inner cache and outer cache maintenance, so the flow can be: 2. Flush sg buffer from inner cache 3. dsb() 4. start Flush outer cache for that buffer 5. Flush next sg buffer from inner cache 6. dsb 7. goto 4. 8. when no more buffers left, wait for outer cache operations

saeed

Russell King - ARM Linux

3 Jul 3 Jul

2:47 p.m.

New subject: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

On Tue, Jun 28, 2011 at 09:22:20AM +0300, saeed bishara wrote:

...

On Mon, Jun 27, 2011 at 2:02 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...
On Mon, Jun 27, 2011 at 01:34:48PM +0300, saeed bishara wrote:

...
Russell, I'm curious about the correctness of this patch for systems with outer cache. shouldn't the dsb be issued before the outer cache maintenance?

Maybe we should do two passes over SG lists then - one for the inner and another for the outer cache?

In effect we could do three passes:

Calculate the total size of the SG list to determine whether full

cache flush is more efficient. 2. Flush inner cache Then dsb() 3. Flush outer cache Another dsb()

looking at l2x0 cache, it seems to me that it would be possible to do asynchronous l2 cache maintenance for range operations, so maybe that can be utilized in order to parallelism between inner cache and outer cache maintenance, so the flow can be: 2. Flush sg buffer from inner cache 3. dsb() 4. start Flush outer cache for that buffer 5. Flush next sg buffer from inner cache 6. dsb 7. goto 4. 8. when no more buffers left, wait for outer cache operations

I'm not sure how practical that is given the architecture of the L2x0 controllers - where we need to wait for the previous operation to complete by reading the operation specific register and then issue a sync.

Having looked at this again, I think trying to do any optimization of this code will be fraught, because of the dmabounce stuff getting in the way. If only dmabounce didn't exist... if only crap hardware didn't exist... dmabounce really needs to die.

5427

days inactive

5441

days old

linaro-dev@lists.linaro.org

37 comments

participants

tags (0)

participants (7)

Kishore Kadiyala
Linus Walleij
Nicolas Pitre
Per Forlin
Russell King - ARM Linux
saeed bishara
Vijaya Kumar K-1