On Mon, Oct 16, 2017 at 4:37 PM, Cyril Hrubis chrubis@suse.cz wrote:
The assertion that fail runs several threads where:
First half of them are writers where each of these loops:
- lock region in a file with posix write lock
- write to the region
- unlock it
While the second half loops concurently (time and file offset vise):
- lock region with read ofd lock
- read the region, expects it to be consistent block vise
- unlock it
As far as I can tell this is supposed to be supported combination
I looked at this again and what I found is that there are multiple concurrent writers that each get a posix write lock and then write to overlapping portions of the file.
What I think happens here is that multiple posix write locks get combined into a single lock, since they originate from the same PGID. On of them The writers therefore do not lock against one another, and they may write concurrently to the same part of the file. This can theoretically leave the file in an inconsistent state, but normally should not, since the individual writes are for larger areas and we would normally see the result of one or the other writes as if they were properly locked.
However, another thing that can happen is that one of the two threads then unlocks the region that it had been writing to, while the other one is still writing. A reader thread can now get the OFD read lock for a region that is completely unlocked but is still being written to.
Arnd