On Fri, May 17, 2019 at 1:47 AM Jan Kara jack@suse.cz wrote:
Let's add Kees to CC for usercopy expertise...
On Thu 16-05-19 17:33:38, Dan Williams wrote:
Jeff discovered that performance improves from ~375K iops to ~519K iops on a simple psync-write fio workload when moving the location of 'struct page' from the default PMEM location to DRAM. This result is surprising because the expectation is that 'struct page' for dax is only needed for third party references to dax mappings. For example, a dax-mapped buffer passed to another system call for direct-I/O requires 'struct page' for sending the request down the driver stack and pinning the page. There is no usage of 'struct page' for first party access to a file via read(2)/write(2) and friends.
However, this "no page needed" expectation is violated by CONFIG_HARDENED_USERCOPY and the check_copy_size() performed in copy_from_iter_full_nocache() and copy_to_iter_mcsafe(). The check_heap_object() helper routine assumes the buffer is backed by a page-allocator DRAM page and applies some checks. Those checks are invalid, dax pages are not from the heap, and redundant, dax_iomap_actor() has already validated that the I/O is within bounds.
So this last paragraph is not obvious to me as check_copy_size() does a lot of various checks in CONFIG_HARDENED_USERCOPY case. I agree that some of those checks don't make sense for PMEM pages but I'd rather handle that by refining check_copy_size() and check_object_size() functions to detect and appropriately handle pmem pages rather that generally skip all the checks in pmem_copy_from/to_iter(). And yes, every check in such hot path is going to cost performance but that's what user asked for with CONFIG_HARDENED_USERCOPY... Kees?
As far as I can see it's mostly check_heap_object() that is the problem, so I'm open to finding a way to just bypass that sub-routine. However, as far as I can see none of the other block / filesystem user copy implementations submit to the hardened checks, like bio_copy_from_iter(), and iov_iter_copy_from_user_atomic() . So, either those need to grow additional checks, or the hardened copy implementation is targeting single object copy use cases, not necessarily block-I/O. Yes, Kees, please advise.