Two stories caught my attention this week, and they tell opposite sides of the same coin: AI agents are getting better at catching bugs in code reviews, but the code they're writing might be creating a maintenance nightmare.
The good news: multi-agent reviews work
A developer named Adam Miller just open-sourced adamsreview, a Claude Code plugin that runs multi-stage PR reviews using parallel sub-agents. His claim? It catches "dramatically more real bugs" than Claude's built-in review commands, CodeRabbit, Greptile, and other popular tools.
The approach is clever: instead of one agent doing a surface-level pass, adamsreview runs validation in stages with persistent state tracking. It's the code-review equivalent of having three people check your math instead of one.
This matters because code review is one of the few places where AI agents can add value without creating downstream problems. Finding a bug before merge costs pennies. Finding it in production costs dollars — or customers.
The bad news: nobody's measuring maintenance cost
But here's the tension: while we're building better bug-catching agents, we're also pushing developers to use AI to write all their code. One Hacker News commenter described transferring to a team at a Fortune 500 company where he was explicitly told "not to write any code by hand." Claude usage is mandatory, backed by a proprietary framework with over 100 agents.
That's not an experiment. That's policy.
James Shore, a software consultant, published a piece arguing that AI coding agents need to reduce maintenance costs, not just ship features faster. His point is simple: most of the cost of software isn't writing it the first time. It's the six months (or six years) of changes, bug fixes, and refactors that follow.
If an AI agent writes code that's hard for humans to understand, debug, or modify — even if it works perfectly on day one — you've just mortgaged your future velocity for a short-term win.
The real test: what happens in month six?
Here's the question nobody's answering yet: when that AI-generated code breaks in production six months from now, and your on-call engineer is staring at a 300-line function with no comments and variable names like result_2_final, how much does that cost you?
We don't know, because we're not measuring it. We're measuring lines of code written per day. We're measuring bugs caught in review. We're not measuring time-to-fix for AI-generated code versus human-written code. We're not tracking how often developers have to rewrite AI output because it's unmaintainable.
One commenter on the customer support thread asked whether low-quality AI support will become the new normal. The answer is: only if companies don't measure the cost of bad AI. If you track support ticket resolution time, escalation rates, and customer churn, you'll kill bad AI agents fast.
The same logic applies to code. If you track maintenance cost — not just feature velocity — you'll know whether your AI coding agents are helping or hurting.
What works right now
The pattern that's emerging: AI agents work best in constrained, reversible, high-feedback loops.
- Code review: Constrained scope (one PR), reversible (you can ignore the feedback), high feedback (you see the results immediately).
- Bug detection: Same deal. The agent flags an issue, a human decides whether it's real.
- Boilerplate generation: Writing a CRUD endpoint for the tenth time? Let the agent do it. You'll review it, you'll understand it, and if it's wrong, you'll catch it fast.
What doesn't work: handing an agent a vague spec and asking it to write a feature you don't understand well enough to review. That's not automation. That's technical debt with a chatbot interface.
The question you should ask your vendor
If someone's selling you an AI coding agent, ask them this: "How do you measure maintenance cost, and what's your benchmark for AI-generated code versus human-written code six months post-deployment?"
If they don't have an answer, they're selling you a feature factory, not a business tool.
What this means for AlphaForge clients: We're building agents for tasks where the feedback loop is tight and the cost of failure is measurable — lead qualification, data extraction, workflow automation. We're not building agents that write code you can't maintain, because we'd rather you stay in business.