Building agent teams that don't hallucinate ship dates
The failure mode isn’t that agents lie. It’s that they’re trained to be helpful, and “helpful” often means producing a confident-sounding answer when the honest answer is “I don’t know.”
Ask a single agent “when will this ship?” and it will give you a date. A clean, plausible, completely made-up date. It has no access to scope clarity, existing technical debt, or team capacity — so it pattern-matches its way to a number that sounds right and tells you what you wanted to hear. That’s worse than no answer, because people plan around it.
What I built instead
Odin is a system of native agent teams, not one over-eager assistant. Work flows through specialists — planning, implementation, review — each with a narrow job and access to the signals that job actually needs. Estimation isn’t a vibe an agent emits at the end. It’s the output of a process: scan the codebase, assess complexity against what already exists, weigh it against real capacity, and surface the uncertainty instead of smoothing it over.
The structure is the point. A single model guesses. A team that’s forced to ground each step in observable signals has to show its work — and a number that has to survive review is a very different number from one a chatbot blurts out.
The results that actually mattered
Two things happened, and the order surprised me.
The headline number is speed: we cut development time by roughly half. Real features, reviewed and shipped, in half the calendar time they used to take. That alone would have justified the build.
But the result I care about more is honesty. The team now reports that over 80% of tickets are estimated correctly — and they deliver in about half the time originally quoted. Estimates you can trust changed how the whole team plans. No more padding to cover for a process that lied. No more quietly slipping dates. The agents stopped hallucinating ship dates because they were never asked to invent one in the first place — they were asked to derive one.
The lesson
Speed is the easy thing to sell and the easy thing to measure. The harder, more valuable outcome is a system that’s calibrated — one that knows what it doesn’t know and says so.
You don’t get there with a better prompt. You get there by structuring the work so the honest answer is the path of least resistance: give each agent the right signals, make it show its reasoning, and design the output to surface doubt rather than bury it. Build it that way and the speed comes too — but the trust is what compounds.