You Can’t Run Production on Probability

    I’ve been thinking about how quickly we’re wiring agents into real systems, codebases, CI pipelines, cloud environments, and I think we’re glossing over something important.

    Production infrastructure has always been built around determinism. Not because engineers are rigid, but because production is unforgiving. When something breaks, you need to answer basic questions: what changed, why did it change, and can we reproduce it?

    Over the last couple of decades, we built layers to make that possible. Gated deploys. Immutable artifacts. Scoped permissions. Audit logs. Rollback strategies. All of it exists to reduce surprise. Given the same input, the system should behave the same way. If it doesn’t, you don’t really have control.

    Agents are different by design. Their strength is that they interpret intent, adapt to context, and explore solution paths. But that also means the same request can produce different plans. A retry can take a different path. A model update can subtly shift behavior.

    That’s fine when generating ideas or drafts.

    It’s different when an agent can merge code, apply infrastructure changes, rotate credentials, or silence alerts.

    At that point, non-determinism stops being interesting and starts becoming operational risk.

    When production breaks, you page a team, not a probability distribution. You need to know what changed, why it changed, and whether it will happen again. “The model thought this was reasonable” isn’t a satisfying postmortem. You need traceability, policy boundaries, and ownership.

    I don’t think the answer is to keep agents out of production. That feels unrealistic and, honestly, like leaving leverage on the table.

    But I do think we’re making a subtle architectural mistake when we collapse reasoning and execution into the same layer.

    Let agents reason. Let them explore. Let them propose diffs and remediation plans.

    The side effects, the merges, deploys, and infrastructure changes, need to pass through something deterministic. A layer that enforces policy outside the model. That scopes identity. That bounds autonomy. That makes every action attributable.

    We’ve been here before. Early on, engineers deployed directly to production. As systems scaled, that stopped working. So we introduced pipelines, reviews, and policy enforcement. In effect, we built a control plane for people.

    Agents are no different.

    This is exactly the problem we’re working on at Guild.

    If we wire probabilistic systems directly into deterministic execution paths, we’ll get exactly what you’d expect: outages we can’t reproduce, policy violations we can’t explain, and blast radius we didn’t intend.

    The leverage of autonomy is real. So is the risk.

    If agents are going to operate in production, they need something stronger than prompts and good intentions.

    They need hard boundaries.

    They need enforcement outside the model.

    They need a control plane.