On Jul 4, 2020, at 8:46 PM, Jan Ziak 0xe2.0x9a.0x9b@gmail.com wrote:
On Sun, Jul 5, 2020 at 4:16 AM Matthew Wilcox willy@infradead.org wrote:
On Sun, Jul 05, 2020 at 04:06:22AM +0200, Jan Ziak wrote:
Hello
At first, I thought that the proposed system call is capable of reading *multiple* small files using a single system call - which would help increase HDD/SSD queue utilization and increase IOPS (I/O operations per second) - but that isn't the case and the proposed system call can read just a single file.
Without the ability to read multiple small files using a single system call, it is impossible to increase IOPS (unless an application is using multiple reader threads or somehow instructs the kernel to prefetch multiple files into memory).
What API would you use for this?
ssize_t readfiles(int dfd, char **files, void **bufs, size_t *lens);
I pretty much hate this interface, so I hope you have something better in mind.
I am proposing the following:
struct readfile_t { int dirfd; const char *pathname; void *buf; size_t count; int flags; ssize_t retval; // set by kernel int reserved; // not used by kernel };
If you are going to pass a struct from userspace to the kernel, it should not mix int and pointer types (which may be 64-bit values, so that there are not structure packing issues, like:
struct readfile { int dirfd; int flags; const char *pathname; void *buf; size_t count; ssize_t retval; };
It would be better if "retval" was returned in "count", so that the structure fits nicely into 32 bytes on a 64-bit system, instead of being 40 bytes per entry, which adds up over many entries, like.
struct readfile { int dirfd; int flags; const char *pathname; void *buf; ssize_t count; /* input: bytes requested, output: bytes read or -errno */ };
However, there is still an issue with passing pointers from userspace, since they may be 32-bit userspace pointers on a 64-bit kernel.
int readfiles(struct readfile_t *requests, size_t count);
It's not clear why count is a "size_t" since it is not a size. An unsigned int is fine here, since it should never be negative.
Returns zero if all requests succeeded, otherwise the returned value is non-zero (glibc wrapper: -1) and user-space is expected to check which requests have succeeded and which have failed. retval in readfile_t is set to what the single-file readfile syscall would return if it was called with the contents of the corresponding readfile_t struct.
The glibc library wrapper of this system call is expected to store the errno in the "reserved" field. Thus, a programmer using glibc sees:
struct readfile_t { int dirfd; const char *pathname; void *buf; size_t count; int flags; ssize_t retval; // set by glibc (-1 on error) int errno; // set by glibc if retval is -1 };
Why not just return the errno directly in "retval", or in "count" as proposed? That avoids further bloating the structure by another field.
retval and errno in glibc's readfile_t are set to what the single-file glibc readfile would return (retval) and set (errno) if it was called with the contents of the corresponding readfile_t struct. In case of an error, glibc will pick one readfile_t which failed (such as: the 1st failed one) and use it to set glibc's errno.
Cheers, Andreas