Re: [Y2038] [PATCH v2 27/27] Documentation: document ioctl interfaces better

19 Dec 2019

      On Wed, Dec 18, 2019 at 11:45 PM Ben Hutchings
ben.hutchings@codethink.co.uk wrote:
...
On Tue, 2019-12-17 at 23:17 +0100, Arnd Bergmann wrote:
...
--- /dev/null
+++ b/Documentation/core-api/ioctl.rst
+``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
+ioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
+``_IOW``, and ``_IORW``. These should be used for all new commands,
Typo: "_IORW" should be "_IOWR".
Fixed now
...
...
+with the correct parameters:

+_IO/_IOR/_IOW/_IOWR

The macro name determines whether the argument is used for passing
data into kernel (_IOW), from the kernel (_IOR), both (_IOWR) or is
not a pointer (_IO). It is possible but not recommended to pass an
integer value instead of a pointer with _IO.

I feel the explanation of _IO here could be confusing.  I think what
you meant to say was that it is possible, but not recommended, to pass
integers directly (arg is integer) rather than indirectly (arg is
pointer to integer).  I suggest the alternate wording:
The macro name specifies how the argument will be used.  It may be a
pointer to data to be passed into the kernel (_IOW), out of the kernel
(_IOR), or both (_IOWR).  The argument may also be an integer value
instead of a pointer (_IO), but this is not recommended.
That's probably better than my version, but I find that misleading as well:
it sounds like _IO() is not recommended, but having no argument with
_IO() is actually fine. This is what I have now:
The macro name specifies how the argument will be used.  It may be a
   pointer to data to be passed into the kernel (_IOW), out of the kernel
   (_IOR), or both (_IOWR).  _IO can indicate either commands with no
   argument or those passing an integer value instead of a pointer.
   It is recommended to only use _IO for commands without arguments,
   and use pointers for passing data.
...
...
+data_type

The name of the data type pointed to by the argument, the command number
encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
leading to a limit of 8191 bytes for the maximum size of the argument.
Note: do not pass sizeof(data_type) type into _IOR/IOW, as that will
lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).

You left out _IOWR here.  It might also be worth mentioning that _IO
doesn't have this parameter.
Changed now.
...
[...]
...
+Return code
+===========

+ioctl commands can return negative error codes as documented in errno(3),
+these get turned into errno values in user space.
Use a semi-colon instead of a comma, or change "these" to "which".
done
...
...
On success, the return
+code should be zero. It is also possible but not recommended to return
+a positive 'long' value.

+When the ioctl callback is called with an unknown command number, the
+handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
+-ENOTTY being returned from the system call. Some subsystems return
+-ENOSYS or -EINVAL here for historic reasons, but this is wrong.

+Prior to Linux-5.5, compat_ioctl handlers were required to return
Space instead of hyphen.
done
...
...
+-ENOIOCTLCMD in order to use the fallback conversion into native
+commands. As all subsystems are now responsible for handling compat
+mode themselves, this is no longer needed, but it may be important to
+consider when backporting bug fixes to older kernels.

+Timestamps
+==========

+Traditionally, timestamps and timeout values are passed as ``struct
+timespec`` or ``struct timeval``, but these are problematic because of
+incompatible definitions of these structures in user space after the
+move to 64-bit time_t.

+The __kernel_timespec type can be used instead to be embedded in other
It's not a typedef, so ``struct __kernel_timespec``.
done
...
[...]
...
+32-bit compat mode
+==================

+In order to support 32-bit user space running on a 64-bit machine, each
+subsystem or driver that implements an ioctl callback handler must also
+implement the corresponding compat_ioctl handler.

+As long as all the rules for data structures are followed, this is as
+easy as setting the .compat_ioctl pointer to a helper function such as
+compat_ptr_ioctl() or blkdev_compat_ptr_ioctl().

+compat_ptr()
+------------

+On the s/390 architecture, 31-bit user space has ambiguous representations
IBM never used the name "S/390" for the 64-bit mainframe architecture,
but they have rebranded it several times.  Rather than trying to follow
what it's called this year, maybe just write "s390" to match what we
usually call it?
ok, done
...
...

has four bytes of padding between a and b on x86-64, plus another four
bytes of padding at the end, but no padding on i386, and it needs a
compat_ioctl conversion handler to translate between the two formats.

To avoid this problem, all structures should have their members
naturally aligned, or explicit reserved fields added in place of the
implicit padding.

This should explain how to check that - presumably by running pahole on
some sensible architecture.
Ok, added "The ``pahole`` tool can be used for checking the alignment.".
...
...
+* On ARM OABI user space, 16-bit member variables have 32-bit

alignment, making them incompatible with modern EABI kernels.

I thought that OABI required structures as a whole to have alignment of
4, not individual members?  Which obviously does affect small
structures as members of other structures.
You are right, I clearly misunderstood that. Changed the paragraph now to
* On ARM OABI user space, structures are padded to multiples of 32-bit,
  making some structs incompatible with modern EABI kernels if they
  do not end on a 32-bit boundary.
* On the m68k architecture, struct members are not guaranteed to have an
  alignment greater than 16-bit, which is a problem when relying on
  implicit padding.
...
[...]
...
+Information leaks
+=================

+Uninitialized data must not be copied back to user space, as this can
+cause an information leak, which can be used to defeat kernel address
+space layout randomization (KASLR), helping in an attack.

+As explained for the compat mode, it is best to not avoid any implicit
Delete "not".
Done.
...
+padding in data structures, but if there is already padding in existing
...
+structures, the kernel driver must be careful to zero out the padding
+using memset() or similar before copying it to user space.
This sentence is rather too long.  Also it can be read as suggesting
that one should somehow identify and memset() the padding just before
copying to user-space.  I suggest an alternate wording:
For this reason (and for compat support) it is best to avoid any
implicit padding in data structures.  Where there is implicit padding
in an existing structure, kernel drivers must be careful to fully
initialize an instance of the structure before copying it to user
space.  This is usually done by calling memset() before assigning to
individual members.
Sounds good, I've taken that paragraph now.
...
[...]
...
+Alternatives to ioctl
+=====================
[...]
...
+* A custom file system can provide extra flexibility with a simple

user interface but add a lot of complexity to the implementation.

Typo: "add" should be "adds".
Fixed
Thanks for all the good suggestions!
Arnd

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Y2038] [PATCH v2 27/27] Documentation: document ioctl interfaces better