On Mon, Jun 08, 2020 at 09:03:21AM -0400, Mimi Zohar wrote:
On Sat, 2020-06-06 at 08:52 -0700, Matthew Wilcox wrote:
On Fri, Jun 05, 2020 at 10:04:51PM -0700, Scott Branden wrote:
-int kernel_read_file(struct file *file, void **buf, loff_t *size,
loff_t max_size, enum kernel_read_file_id id)
-{
- loff_t i_size, pos;
+int kernel_pread_file(struct file *file, void **buf, loff_t *size,
loff_t pos, loff_t max_size,
enum kernel_pread_opt opt,
enum kernel_read_file_id id)
+{
- loff_t alloc_size;
- loff_t buf_pos;
- loff_t read_end;
- loff_t i_size; ssize_t bytes = 0; int ret;
Look, it's not your fault, but this is a great example of how we end up with atrocious interfaces. Someone comes along and implements a simple DWIM interface that solves their problem. Then somebody else adds a slight variant that solves their problem, and so on and so on, and we end up with this bonkers API where the arguments literally change meaning depending on other arguments.
@@ -950,21 +955,31 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size, ret = -EINVAL; goto out; }
- if (i_size > SIZE_MAX || (max_size > 0 && i_size > max_size)) {
- /* Default read to end of file */
- read_end = i_size;
- /* Allow reading partial portion of file */
- if ((opt == KERNEL_PREAD_PART) &&
(i_size > (pos + max_size)))
read_end = pos + max_size;
- alloc_size = read_end - pos;
- if (i_size > SIZE_MAX || (max_size > 0 && alloc_size > max_size)) { ret = -EFBIG; goto out;
... like that.
I think what we actually want is:
ssize_t vmap_file_range(struct file *, loff_t start, loff_t end, void **bufp); void vunmap_file_range(struct file *, void *buf);
If end > i_size, limit the allocation to i_size. Returns the number of bytes allocated, or a negative errno. Writes the pointer allocated to *bufp. Internally, it should use the page cache to read in the pages (taking appropriate reference counts). Then it maps them using vmap() instead of copying them to a private vmalloc() array.
kernel_read_file() can be converted to use this API. The users will need to be changed to call kernel_read_end(struct file *file, void *buf) instead of vfree() so it can call allow_write_access() for them.
vmap_file_range() has a lot of potential uses. I'm surprised we don't have it already, to be honest.
Prior to kernel_read_file() the same or verify similar code existed in multiple places in the kernel. The kernel_read_file() API consolidated the existing code adding the pre and post security hooks.
With this new design of not using a private vmalloc, will the file data be accessible prior to the post security hooks? From an IMA perspective, the hooks are used for measuring and/or verifying the integrity of the file.
File data is already accessible prior to the post security hooks. Look how kernel_read_file works:
ret = deny_write_access(file); ret = security_kernel_read_file(file, id); *buf = vmalloc(i_size); bytes = kernel_read(file, *buf + pos, i_size - pos, &pos); ret = security_kernel_post_read_file(file, *buf, i_size, id);
kernel_read() will read the data into the page cache and then copy it into the vmalloc'd buffer. There's nothing here to prevent read accesses to the file.