On Mon, Nov 6, 2017 at 3:18 PM, Cyril Hrubis chrubis@suse.cz wrote:
Hi!
Since the ofd read lock blocks are starting at offset 0 and are write_size in size and the posix write locks are starting at offset write_size/4 and are as well write_size in size.
I tried to make sure that we do not have overlaping posix locks since in that case it's very easy to get a failure.
Ok, I see. In this case, we still see the posix locks being merged between the writers.Could it be that (either by design, or by accident) the F_UNLCK operation on one of them now releases the combined lock area?
That shouldn't be happening either. The threads that place the POSIX locks does not overlap and they should unlock exactly the same block that has been locked. So even if neighbouring locks are merged when locked the unlock operation should split them back again.
Makes sense. Any idea how to analyze the failure further?
One more thing I looked at are the results that Naresh posted, in particular the file offsets that get reported for the incorrect reads, and there are actually three tests that fail, all of them involving one OFD lock and one POSIX lock:
0x3010 OFD read lock vs POSIX write lock 0x3500 OFD read lock vs POSIX write lock (second fail) 0x5080 OFD read lock vs POSIX write lock 0x5020 OFD read lock vs POSIX write lock
0x08b0 OFD r/w lock vs POSIX write lock 0x3008 OFD r/w lock vs POSIX write lock 0x5080 OFD r/w lock vs POSIX write lock
0x5420 OFD r/w lock vs POSIX read lock 0x36d8 OFD r/w lock vs POSIX read lock 0x4700 OFD r/w lock vs POSIX read lock (second fail) 0x5430 OFD r/w lock vs POSIX read lock
These are always 8-byte aligned addresses, but not cacheline aligned. The ones that fail in fn_ofd_r() fail at at n*4096 + m*8, with m being a small number. The ones that fail in fn_posix_r apparently tend to do so at address n*4096 + 1024 + m*8. In both cases that is the start of the read buffer, but there are some exceptions.
Since the file system is tmpfs, the access is directly on the page cache, and the write must be hitting memory just as we are accessing the first cache line of the page cache page most of the time this happens, but not always.
The 8-byte alignment is particularly interesting here, since I'm pretty sure we come here from copy_to_user(), which uses the 'ldp' and 'stp' instructions on arm64 to copy 16 bytes at a time.
Arnd