a old issue of ext4 on lts 3.10

List overview All Threads
Download

newer

older

master build: 0 failures 25...

Alex Shi

14 May 2015 14 May '15

2:40 p.m.

Hi Dmitry&Theodore,

Someone said without the following patch on lts 3.10 kernel (which used as android base kernel). the write maybe very very slow, needs 1 or 2 seconds to finish.

I quick looked this patch, seems it's no harm for a normal fs function. but still don't know why it is helpful. So do you remember why you commit this change at that time?

Thanks Alex

ommit 7afe5aa59ed3da7b6161617e7f157c7c680dc41e Author: Dmitry Monakhov dmonakhov@openvz.org Date: Wed Aug 28 14:30:47 2013 -0400

ext4: convert write_begin methods to stable_page_writes semantics

Use wait_for_stable_page() instead of wait_on_page_writeback()

Signed-off-by: Dmitry Monakhov dmonakhov@openvz.org Signed-off-by: "Theodore Ts'o" tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fc4051e..47c8e46 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -969,7 +969,8 @@ retry_journal: ext4_journal_stop(handle); goto retry_grab; } - wait_on_page_writeback(page); + /* In case writeback began while the page was unlocked */ + wait_for_stable_page(page);

if (ext4_should_dioread_nolock(inode)) ret = __block_write_begin(page, pos, len, ext4_get_block_write); @@ -2678,7 +2679,7 @@ retry_journal: goto retry_grab; } /* In case writeback began while the page was unlocked */ - wait_on_page_writeback(page); + wait_for_stable_page(page);

ret = __block_write_begin(page, pos, len, ext4_da_get_block_prep); if (ret < 0) { ~

-- Thanks Alex

Show replies by date

Dmitry Monakhov

14 May 14 May

8:36 p.m.

Alex Shi alex.shi@linaro.org writes:

...

Hi Dmitry&Theodore,

Someone said without the following patch on lts 3.10 kernel (which used as android base kernel). the write maybe very very slow, needs 1 or 2 seconds to finish.

In fact this was an optimization. wait_for_stable_page() is actually and optimized wait_on_page_writeback()

see: void wait_for_stable_page(struct page *page) { struct address_space *mapping = page_mapping(page); struct backing_dev_info *bdi = mapping->backing_dev_info;

if (!bdi_cap_stable_pages_required(bdi)) return;

wait_on_page_writeback(page); } It is very unlikely the patch provokes such huge slowdown. Can you please repeat your measurements and double check your evidence.

...

I quick looked this patch, seems it's no harm for a normal fs function. but still don't know why it is helpful. So do you remember why you commit this change at that time?

Thanks Alex

ommit 7afe5aa59ed3da7b6161617e7f157c7c680dc41e Author: Dmitry Monakhov dmonakhov@openvz.org Date: Wed Aug 28 14:30:47 2013 -0400
ext4: convert write_begin methods to stable_page_writes semantics

Use wait_for_stable_page() instead of wait_on_page_writeback()

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fc4051e..47c8e46 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -969,7 +969,8 @@ retry_journal: ext4_journal_stop(handle); goto retry_grab; }
  wait_on_page_writeback(page);
  /* In case writeback began while the page was unlocked */
  wait_for_stable_page(page);

  if (ext4_should_dioread_nolock(inode))
          ret = __block_write_begin(page, pos, len,
ext4_get_block_write); @@ -2678,7 +2679,7 @@ retry_journal: goto retry_grab; } /* In case writeback began while the page was unlocked */
  wait_on_page_writeback(page);
  wait_for_stable_page(page);

  ret = __block_write_begin(page, pos, len, ext4_da_get_block_prep);
  if (ret < 0) {
~

-- Thanks Alex

Alex Shi

17 May 17 May

3:19 p.m.

On 05/15/2015 04:36 AM, Dmitry Monakhov wrote:

...

Alex Shi alex.shi@linaro.org writes:

...
...
Hi Dmitry&Theodore,

Someone said without the following patch on lts 3.10 kernel (which used as android base kernel). the write maybe very very slow, needs 1 or 2 seconds to finish.

In fact this was an optimization. wait_for_stable_page() is actually and optimized wait_on_page_writeback()

Hi, Dimtry, it *is* a optimization, the fault is just happened *without* this patch, not with this. :) The curious for me is why this patch has this effect. It looks like the new func just wait page wb when the device support data integrity. But Why the data integrity device need to wait wb, while other device don't need?

BTW, how to know if my disk support data integrity. My harddisk spec said it has this feature, but my linux kernel with integrity supported don't have /sys/block/sdx/integrity.

Thanks a lots for your quick response!

...

see: void wait_for_stable_page(struct page *page) { struct address_space *mapping = page_mapping(page); struct backing_dev_info *bdi = mapping->backing_dev_info;
    if (!bdi_cap_stable_pages_required(bdi))
                    return;

    wait_on_page_writeback(page);
} It is very unlikely the patch provokes such huge slowdown. Can you please repeat your measurements and double check your evidence.

Jan Kara

18 May 18 May

6:41 a.m.

On Sun 17-05-15 23:19:52, Alex Shi wrote:

...

On 05/15/2015 04:36 AM, Dmitry Monakhov wrote:

...
Alex Shi alex.shi@linaro.org writes:

...
...
Hi Dmitry&Theodore,

Someone said without the following patch on lts 3.10 kernel (which used as android base kernel). the write maybe very very slow, needs 1 or 2 seconds to finish.

In fact this was an optimization. wait_for_stable_page() is actually and optimized wait_on_page_writeback()

Hi, Dimtry, it *is* a optimization, the fault is just happened *without* this patch, not with this. :) The curious for me is why this patch has this effect. It looks like the new func just wait page wb when the device support data integrity. But Why the data integrity device need to wait wb, while other device don't need?

Because the disk driver may be computing checksum of the data before submitting it to the disk and if you change the data after the checksum is computed but before the DMA transfer is done, the checksum will not match.

...

BTW, how to know if my disk support data integrity. My harddisk spec said it has this feature, but my linux kernel with integrity supported don't have /sys/block/sdx/integrity.

The feature you are looking for is called DIF/DIX IIRC and not many disks support it.

Honza

...

...
see: void wait_for_stable_page(struct page *page) { struct address_space *mapping = page_mapping(page); struct backing_dev_info *bdi = mapping->backing_dev_info;
    if (!bdi_cap_stable_pages_required(bdi))
                    return;

    wait_on_page_writeback(page);
} It is very unlikely the patch provokes such huge slowdown. Can you please repeat your measurements and double check your evidence.

-- Jan Kara jack@suse.cz SUSE Labs, CR

Alex Shi

20 May 20 May

3:04 a.m.

...

Because the disk driver may be computing checksum of the data before submitting it to the disk and if you change the data after the checksum is computed but before the DMA transfer is done, the checksum will not match.

Thanks again!

...

...
BTW, how to know if my disk support data integrity. My harddisk spec said it has this feature, but my linux kernel with integrity supported don't have /sys/block/sdx/integrity.

The feature you are looking for is called DIF/DIX IIRC and not many disks support it.

Oh, that's bad. Looks in block layer will merge or find out the shared checksum type. So if 2 kind of different disk in system without share checksum type. that could cause each of disk lose data integrity checking?

Jan Kara

18 May 18 May

6:21 a.m.

On Thu 14-05-15 23:36:31, Dmitry Monakhov wrote:

...

Alex Shi alex.shi@linaro.org writes:

...
Hi Dmitry&Theodore,

Someone said without the following patch on lts 3.10 kernel (which used as android base kernel). the write maybe very very slow, needs 1 or 2 seconds to finish.

In fact this was an optimization. wait_for_stable_page() is actually and optimized wait_on_page_writeback()

see: void wait_for_stable_page(struct page *page) { struct address_space *mapping = page_mapping(page); struct backing_dev_info *bdi = mapping->backing_dev_info;
    if (!bdi_cap_stable_pages_required(bdi))
                    return;

    wait_on_page_writeback(page);
} It is very unlikely the patch provokes such huge slowdown. Can you please repeat your measurements and double check your evidence.

I think Alex meant that without the patch he is seeing long stalls. That is possible when we wait for writeback and the storage is busy.

...

...
I quick looked this patch, seems it's no harm for a normal fs function. but still don't know why it is helpful. So do you remember why you commit this change at that time?

The patch helps because most of storage today doesn't require that the page isn't changed while IO is in flight. That is required only for data checksumming or copy-on-write semantics but ext4 does neither of those. So we don't have to wait for IO completion in ext4_write_begin() unless underlying storage requires it.

Honza

...

...
ommit 7afe5aa59ed3da7b6161617e7f157c7c680dc41e Author: Dmitry Monakhov dmonakhov@openvz.org Date: Wed Aug 28 14:30:47 2013 -0400
ext4: convert write_begin methods to stable_page_writes semantics

Use wait_for_stable_page() instead of wait_on_page_writeback()

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fc4051e..47c8e46 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -969,7 +969,8 @@ retry_journal: ext4_journal_stop(handle); goto retry_grab; }
  wait_on_page_writeback(page);
  /* In case writeback began while the page was unlocked */
  wait_for_stable_page(page);

  if (ext4_should_dioread_nolock(inode))
          ret = __block_write_begin(page, pos, len,
ext4_get_block_write); @@ -2678,7 +2679,7 @@ retry_journal: goto retry_grab; } /* In case writeback began while the page was unlocked */
  wait_on_page_writeback(page);
  wait_for_stable_page(page);

  ret = __block_write_begin(page, pos, len, ext4_da_get_block_prep);
  if (ret < 0) {
~

-- Thanks Alex

-- Jan Kara jack@suse.cz SUSE Labs, CR

Alex Shi

20 May 20 May

2:58 a.m.

On 05/18/2015 02:21 PM, Jan Kara wrote:

...

On Thu 14-05-15 23:36:31, Dmitry Monakhov wrote:

...
Alex Shi alex.shi@linaro.org writes:

...
Hi Dmitry&Theodore,

Someone said without the following patch on lts 3.10 kernel (which used as android base kernel). the write maybe very very slow, needs 1 or 2 seconds to finish.

In fact this was an optimization. wait_for_stable_page() is actually and optimized wait_on_page_writeback()

see: void wait_for_stable_page(struct page *page) { struct address_space *mapping = page_mapping(page); struct backing_dev_info *bdi = mapping->backing_dev_info;
     if (!bdi_cap_stable_pages_required(bdi))
                     return;

     wait_on_page_writeback(page);
} It is very unlikely the patch provokes such huge slowdown. Can you please repeat your measurements and double check your evidence.
I think Alex meant that without the patch he is seeing long stalls. That is possible when we wait for writeback and the storage is busy.

yes.

...

...
...
I quick looked this patch, seems it's no harm for a normal fs function. but still don't know why it is helpful. So do you remember why you commit this change at that time?

The patch helps because most of storage today doesn't require that the page isn't changed while IO is in flight. That is required only for data checksumming or copy-on-write semantics but ext4 does neither of those. So we don't have to wait for IO completion in ext4_write_begin() unless underlying storage requires it.
						Honza

Thanks a lot for clear explanations, Honza!

...

...
...
ommit 7afe5aa59ed3da7b6161617e7f157c7c680dc41e Author: Dmitry Monakhov dmonakhov@openvz.org Date: Wed Aug 28 14:30:47 2013 -0400
 ext4: convert write_begin methods to stable_page_writes semantics

 Use wait_for_stable_page() instead of wait_on_page_writeback()

 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Reviewed-by: Jan Kara <jack@suse.cz>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fc4051e..47c8e46 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -969,7 +969,8 @@ retry_journal: ext4_journal_stop(handle); goto retry_grab; }
  wait_on_page_writeback(page);
  /* In case writeback began while the page was unlocked */
  wait_for_stable_page(page);

   if (ext4_should_dioread_nolock(inode))
           ret = __block_write_begin(page, pos, len,
ext4_get_block_write); @@ -2678,7 +2679,7 @@ retry_journal: goto retry_grab; } /* In case writeback began while the page was unlocked */
  wait_on_page_writeback(page);
  wait_for_stable_page(page);

   ret = __block_write_begin(page, pos, len, ext4_da_get_block_prep);
   if (ret < 0) {
~

-- Thanks Alex

Alex Shi

21 May 21 May

9:28 a.m.

Hi Greg,

It was reported this commit could save few seconds sometime in consequence writing on smart phone.

commit 7afe5aa59ed3da7b6161617e7f157c7c680dc41e ext4: convert write_begin methods to stable_page_writes semantics

...

The patch helps because most of storage today doesn't require that the page isn't changed while IO is in flight. That is required only for data checksumming or copy-on-write semantics but ext4 does neither of those. So we don't have to wait for IO completion in ext4_write_begin() unless underlying storage requires it.
						Honza

Seems it is a very simple and useful patch for some stable kernel, like lts 3.10. Would you like to pick it up?

Thanks Alex

On 05/18/2015 02:21 PM, Jan Kara wrote:

...

On Thu 14-05-15 23:36:31, Dmitry Monakhov wrote:

...
Alex Shi alex.shi@linaro.org writes:

...
Hi Dmitry&Theodore,

Someone said without the following patch on lts 3.10 kernel (which used as android base kernel). the write maybe very very slow, needs 1 or 2 seconds to finish.

In fact this was an optimization. wait_for_stable_page() is actually and optimized wait_on_page_writeback()

see: void wait_for_stable_page(struct page *page) { struct address_space *mapping = page_mapping(page); struct backing_dev_info *bdi = mapping->backing_dev_info;
    if (!bdi_cap_stable_pages_required(bdi))
                    return;

    wait_on_page_writeback(page);
} It is very unlikely the patch provokes such huge slowdown. Can you please repeat your measurements and double check your evidence.
I think Alex meant that without the patch he is seeing long stalls. That is possible when we wait for writeback and the storage is busy.

...
...
I quick looked this patch, seems it's no harm for a normal fs function. but still don't know why it is helpful. So do you remember why you commit this change at that time?

The patch helps because most of storage today doesn't require that the page isn't changed while IO is in flight. That is required only for data checksumming or copy-on-write semantics but ext4 does neither of those. So we don't have to wait for IO completion in ext4_write_begin() unless underlying storage requires it.
						Honza
...
...
ommit 7afe5aa59ed3da7b6161617e7f157c7c680dc41e Author: Dmitry Monakhov dmonakhov@openvz.org Date: Wed Aug 28 14:30:47 2013 -0400
ext4: convert write_begin methods to stable_page_writes semantics

Use wait_for_stable_page() instead of wait_on_page_writeback()

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fc4051e..47c8e46 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -969,7 +969,8 @@ retry_journal: ext4_journal_stop(handle); goto retry_grab; }
  wait_on_page_writeback(page);
  /* In case writeback began while the page was unlocked */
  wait_for_stable_page(page);

  if (ext4_should_dioread_nolock(inode))
          ret = __block_write_begin(page, pos, len,
ext4_get_block_write); @@ -2678,7 +2679,7 @@ retry_journal: goto retry_grab; } /* In case writeback began while the page was unlocked */
  wait_on_page_writeback(page);
  wait_for_stable_page(page);

  ret = __block_write_begin(page, pos, len, ext4_da_get_block_prep);
  if (ret < 0) {
~

-- Thanks Alex

gregkh＠linuxfoundation.org

4:51 p.m.

On Thu, May 21, 2015 at 05:28:28PM +0800, Alex Shi wrote:

...

Hi Greg,

It was reported this commit could save few seconds sometime in consequence writing on smart phone.

commit 7afe5aa59ed3da7b6161617e7f157c7c680dc41e ext4: convert write_begin methods to stable_page_writes semantics

...
The patch helps because most of storage today doesn't require that the page isn't changed while IO is in flight. That is required only for data checksumming or copy-on-write semantics but ext4 does neither of those. So we don't have to wait for IO completion in ext4_write_begin() unless underlying storage requires it.
						Honza
Seems it is a very simple and useful patch for some stable kernel, like lts 3.10. Would you like to pick it up?

If Jan and Ted say it's ok, I'll queue it up.

Have you tried this patch and verified that it does fix an issue and is measurable?

thanks,

greg k-h

Jan Kara

22 May 22 May

8:26 a.m.

On Thu 21-05-15 09:51:59, gregkh@linuxfoundation.org wrote:

...

On Thu, May 21, 2015 at 05:28:28PM +0800, Alex Shi wrote:

...
Hi Greg,

It was reported this commit could save few seconds sometime in consequence writing on smart phone.

commit 7afe5aa59ed3da7b6161617e7f157c7c680dc41e ext4: convert write_begin methods to stable_page_writes semantics

...
The patch helps because most of storage today doesn't require that the page isn't changed while IO is in flight. That is required only for data checksumming or copy-on-write semantics but ext4 does neither of those. So we don't have to wait for IO completion in ext4_write_begin() unless underlying storage requires it.
						Honza
Seems it is a very simple and useful patch for some stable kernel, like lts 3.10. Would you like to pick it up?
If Jan and Ted say it's ok, I'll queue it up.

Yes, it's OK.

Honza

-- Jan Kara jack@suse.cz SUSE Labs, CR

Alex Shi

24 May 24 May

2:54 a.m.

On 05/22/2015 12:51 AM, gregkh@linuxfoundation.org wrote:

...

...
...
Seems it is a very simple and useful patch for some stable kernel, like lts 3.10. Would you like to pick it up?

If Jan and Ted say it's ok, I'll queue it up.

Have you tried this patch and verified that it does fix an issue and is measurable?

I just tested this patch function on Kevin's testing system, kernelci.org. It is good.

But didn't measure how much effect of this patch, Just the reliable source said, it do help many small files writing performance on a smart phone.

...

thanks,

greg k-h

3960

days inactive

3970

days old

linaro-kernel@lists.linaro.org

10 comments

participants

tags (0)

participants (4)

Alex Shi
Dmitry Monakhov
gregkh＠linuxfoundation.org
Jan Kara