On Tue, Oct 31, 2017 at 10:33 AM, Cyril Hrubis chrubis@suse.cz wrote:
Hi!
I looked at this again and what I found is that there are multiple concurrent writers that each get a posix write lock and then write to overlapping portions of the file.
The blocks should not overlap for one kind of a lock, they overlap for different kind of locks though. So that posix write locks overlap with ofd read locks, etc.
So in the failing case it should look like:
| ofd read | ofd read | ofd read | | posix wr | posix wr | posix wr |
Since the ofd read lock blocks are starting at offset 0 and are write_size in size and the posix write locks are starting at offset write_size/4 and are as well write_size in size.
I tried to make sure that we do not have overlaping posix locks since in that case it's very easy to get a failure.
Ok, I see. In this case, we still see the posix locks being merged between the writers.Could it be that (either by design, or by accident) the F_UNLCK operation on one of them now releases the combined lock area?
I haven't looked at the code again, just guessing.
Arnd