Hi,
On Sun, Mar 21, 2021 at 11:45:59AM -0700, Kees Cook wrote:
On Sun, Mar 21, 2021 at 04:01:18PM +0100, John Wood wrote:
On Wed, Mar 17, 2021 at 07:57:10PM -0700, Kees Cook wrote:
On Sun, Mar 07, 2021 at 12:30:26PM +0100, John Wood wrote:
Sorry, but I try to understand how to use locking properly without luck.
I have read (and tried to understand): tools/memory-model/Documentation/simple.txt tools/memory-model/Documentation/ordering.txt tools/memory-model/Documentation/recipes.txt Documentation/memory-barriers.txt
And I don't find the responses that I need. I'm not saying they aren't there but I don't see them. So my questions:
If in the above function makes sense to use locking, and it is called from the brute_task_fatal_signal hook, then, all the functions that are called from this hook need locking (more than one process can access stats at the same time).
So, as you point, how it is possible and safe to read jiffies and faults (and I think period even though you not mention it) using READ_ONCE() but without holding brute_stats::lock? I'm very confused.
There are, I think, 3 considerations:
is "stats", itself, a valid allocation in kernel memory? This is the "lifetime" management of the structure: it will only stay allocated as long as there is a task still alive that is attached to it. The use of refcount_t on task creation/death should entirely solve this issue, so that all the other places where you access "stats", the memory will be valid. AFAICT, this one is fine: you're doing all the correct lifetime management.
changing a task's stats pointer: this is related to lifetime management, but it, I think, entirely solved by the existing refcounting. (And isn't helped by holding stats->lock since this is about stats itself being a valid pointer.) Again, I think this is all correct already in your existing code (due to the implicit locking of "current"). Perhaps I've missed something here, but I guess we'll see!
My only concern now is the following case:
One process crashes with a fatal signal. Then, its stats are updated. Then we get the exec stats (the stats of the task that calls exec). At the same time another CPU frees this same stats. Now, if the first process writes to the exec stats we get a "use after free" bug.
If this scenario is possible, we would need to protect all the section inside the task_fatal_signal hook that deals with the exec stats. I think that here a global lock is necessary and also, protect the write of the pointer to stats struct in the task_free hook.
Moreover, I can see another scenario:
The first CPU gets the exec stats when a task fails with a fatal signal. The second CPU exec()ve after exec()ve over the same task from we get the exec stats with the first CPU. This second CPU resets the stats at the same time that the first CPU updates the same stats. I think we also need lock here.
Am I right? Are these paths possible?
- are the values in stats getting written by multiple writers, or read during a write, etc?
This last one is the core of what I think could be improved here:
To keep the writes serialized, you (correctly) perform locking in the writers. This is fine.
There is also locking in the readers, which I think is not needed. AFAICT, READ_ONCE() (with WRITE_ONCE() in the writers) is sufficient for the readers here.
IIUC (during the reading of the documentation) READ_ONCE and WRITE_ONCE only guarantees that a variable loaded with WRITE_ONCE can be read safely with READ_ONCE avoiding tearing, etc. So, I see these functions like a form of guarantee atomicity in variables.
Right -- from what I can see about how you're reading the statistics, I don't see a way to have the values get confused (assuming locked writes and READ/WRITE_ONCE()).
Another question. Is it also safe to use WRITE_ONCE without holding the lock? Or this is only appliable to read operations?
No -- you'll still want the writer locked since you update multiple fields in stats during a write, so you could miss increments, or interleave count vs jiffies writes, etc. But the WRITE_ONCE() makes sure that the READ_ONCE() readers will see a stable value (as I understand it), and in the order they were written.
Any light on this will help me to do the best job in the next patches. If somebody can point me to the right direction it would be greatly appreciated.
Is there any documentation for newbies regarding this theme? I'm stuck. I have also read the documentation about spinlocks, semaphores, mutex, etc.. but nothing clears me the concept expose.
Apologies if this question has been answered in the past. But the search in the mailing list has not been lucky.
It's a complex subject! Here are some other docs that might help:
tools/memory-model/Documentation/explanation.txt Documentation/core-api/refcount-vs-atomic.rst
or they may melt your brain further! :) I know mine is always mushy after reading them.
Thanks for your time and patience.
You're welcome; and thank you for your work on this! I've wanted a robust brute force mitigation in the kernel for a long time. :)
Thank you very much for this great explanation and mentorship. Now this subject is more clear to me. It's a pleasure to me to work on this.
Again, thanks for your help. John Wood