How Action Bias Breaks Autonomous Software Maintenance

Lightning Talks

Coding agents are increasingly trusted to resolve issues end-to-end: investigate, patch, ship, without a human in the loop. But in real-world maintenance tasks, a large fraction of incoming bug reports describe issues that are already fixed. A competent maintainer moves on. Current agents don't. In our new benchmark FixedBench, frontier models apply unnecessary edits to already-correct code in 35-65% of cases, even with full git history and a working environment. More reasoning doesn't help. Better prompts help, but trade one failure mode for another. Today's training rewards producing patches, not deciding whether one is needed. At scale, that quietly compounds into technical debt. The fix starts with framing inaction as a valid success state.

Co-founder & CTO

Mark Niklas Mueller

LogicStar AI

Dr. Mark Niklas Müller is CTO and Co-Founder of LogicStar AI, where he is building the future of software maintenance for the agentic era. He pioneered methods for benchmarking software agent capabilities, including SWT-Bench and automated benchmark generation. Before founding LogicStar AI, he completed his PhD and conducted postdoctoral research at the Secure, Reliable, and Intelligent Systems (SRI) Lab at ETH Zürich under Prof. Martin Vechev, focusing on provable guarantees for machine learning systems and the intersection of AI and code. He's an avid hiker and road cyclist.

2026 Talks

How Action Bias Breaks Autonomous Software Maintenance

Lightning Talks

The AI Conference for Humans Who Ship