On Tue, Sep 02, 2025 at 04:22:40PM +0000, Manthey, Norbert wrote:
On Tue, 2025-09-02 at 07:48 -0400, Sasha Levin wrote:
One note about the tool: in my experience, unless the tool can also act as an agent and investigate the relevant git repo (and attempt builds and run tests) on it's own, the results used to be very lackluster.
I agree in general. On the other hand, we want to keep the amount of work done by the LLM or agent small. For now, we only submit a bit of context and the commit messages. The validation is executed by the application independently of the agent. There is no feedback loop yet, or similar -- that could all be done in the agent-stage. We have a few more filters and limits to only process commits that are likely to be finished successfully by an LLM.
Consider a simple backport example: let's say that upstream we see a patch that does something like:
mutex_lock(&m); - old_func(); + new_func(); mutex_unlock(&m);
But when we look at an older tree, we see:
spin_lock(&l); old_func(); spin_unlock(&l);
If you don't pass massive amounts of context in, there's no way for an LLM to know if it's safe to simply replace old_func() with new_func() in the old code. Most LLMs I played with will just go ahead and do that.
A human backporter (and most likely, an AI agent) would have a lightbulb moment where they go look at new_func() to see if it's safe to be called under a spinlock.
I guess that my point is that at this level for this usecase, LLMs don't end up being much better than using something like wiggle[1].
[1] https://github.com/neilbrown/wiggle