On Sun, Aug 3, 2025 at 7:27 PM Xu Kuohai xukuohai@huaweicloud.com wrote:
From: Xu Kuohai xukuohai@huawei.com
When the bpf ring buffer is full, new events can not be recorded util the consumer consumes some events to free space. This may cause critical events to be discarded, such as in fault diagnostic, where recent events are more critical than older ones.
So add ovewrite mode for bpf ring buffer. In this mode, the new event overwrites the oldest event when the buffer is full.
The scheme is as follows:
producer_pos tracks the next position to write new data. When there is enough free space, producer simply moves producer_pos forward to make space for the new event.
To avoid waiting for consumer to free space when the buffer is full, a new variable overwrite_pos is introduced for producer. overwrite_pos tracks the next event to be overwritten (the oldest event committed) in the buffer. producer moves it forward to discard the oldest events when the buffer is full.
pending_pos tracks the oldest event under committing. producer ensures producers_pos never passes pending_pos when making space for new events. So multiple producers never write to the same position at the same time.
producer wakes up consumer every half a round ahead to give it a chance to retrieve data. However, for an overwrite-mode ring buffer, users typically only cares about the ring buffer snapshot before a fault occurs. In this case, the producer should commit data with BPF_RB_NO_WAKEUP flag to avoid unnecessary wakeups.
If I understand it correctly the algorithm requires all events to be the same size otherwise first overwrite might trash the header, also the producers should use some kind of signaling to timestamp each event otherwise it all will look out of order to the consumer.
At the end it looks inferior to the existing perf ring buffer with overwrite. Since in both cases the out of order needs to be dealt with in post processing the main advantage of ring buf vs perf buf is gone.