Agents built on reasoning models can produce different outputs from the same input on different runs. This is fundamental to how the models work, not a defect. The governance implication is that you need to observe agent behavior and build confidence over time, the same way you'd evaluate a human employee — not expect perfectly repeatable outputs like traditional software.

Agents Are Non-Deterministic — So Treat Them Like Your Workforce
Most companies building with AI agents right now are making the same mistake: they're treating non-deterministic software like deterministic software and wondering why things keep breaking in ways no one predicted.
James Everingham has a different lens on this problem. Before founding Guild.ai, he led Meta's developer infrastructure org — roughly 1,000 engineers building every tool that Meta's 40,000 developers touched daily. When his team started embedding reasoning models and agents deep into that infrastructure, they didn't just see productivity gains. They saw the exact governance failures that most companies haven't hit yet but will. Agents multiplied faster than anyone expected. Budgets evaporated overnight. Servers ran out. And the team realized: if you're going to unleash non-deterministic systems into corporate infrastructure, you need the same kind of governance layer you'd put around the humans using that infrastructure.
Now, on a recent episode of The Deep View Conversations with host Jason, the Guild CEO laid out why he thinks the companies ignoring that lesson are already getting burned — and what the alternative looks like.
Agents are non-deterministic — just like people
The instinct in most engineering orgs is to treat AI agents like any other piece of software: write the spec, ship it, expect consistent output. Everingham thinks that instinct is exactly wrong.
Agents powered by reasoning models are non-deterministic. You give the same agent the same input twice and you might get different outputs. That's not a bug to be fixed — it's the fundamental nature of the technology. And the former Meta Dev Infra leader argues you already have experience managing non-deterministic systems. You do it every day. They're called employees.
You do need to put some of the same guardrails around these that you put around your workforces.
The implication is uncomfortable but practical: if you wouldn't give a new hire unrestricted access to production, root credentials, and an unlimited cloud budget on day one, why would you give an agent those things? You need to observe behavior, build evals that increase your confidence the agent is operating properly, and accept that it won't always be correct — just like a person won't always be correct.
The difference is that humans operate inside organizational structures — reporting lines, access controls, review processes — that have been refined over decades. Agents operate inside... nothing. Most companies haven't built the equivalent. That's the gap Everingham sees, and it's what Guild is designed to fill.
He draws a sharp line between two requirements: you have to put security around agents, and you have to separate when an agent accesses data from when it executes something. Observation and execution are different privileges, and conflating them is how you end up with the kind of incident that makes your CISO's phone ring at 2 a.m.
What happened inside Meta when agents went viral
Before Guild existed, Everingham's team at Meta built something that served as a proving ground: a managed software center for agents — essentially an internal app store. Engineers could browse available agents, see what they did, see their impact metrics, fork them, and build on top of them.
What happened next caught even the team off guard. Engineers got inspired. They started building their own agents and publishing them to the platform. The agents multiplied rapidly, and suddenly the infrastructure team was scrambling to keep up with demand.
We were just trying to hold the wheels on the cart internally... we were scrambling around literally trying to find tens of thousands of free servers.
That viral adoption was the signal that gave the team confidence they were building something genuinely valuable — not a top-down mandate that engineers tolerated, but a platform engineers actually wanted to use. But it also surfaced the governance problem in vivid color: when agents multiply that fast, who's tracking what they're doing, what they're spending, and what they have access to?
The Meta experience became the blueprint for Guild. The team had built for 40,000 internal developers. Now the ambition was to build for 40 million.
Harness vs. control plane
One of the most clarifying distinctions in the conversation is the difference between an agent harness and a control plane — two things most people conflate.
A harness — something like Claude Code — is really just a software system controlling a model. It orchestrates the prompts, manages the context, and executes the agent's outputs. But Everingham points out that a harness is not that different from an agent itself. It's code talking to a model.
A control plane is where the harness runs inside. It's the layer that validates each step the agent takes, records what happened, makes sure the action was permitted, and collects cost data so finance can actually understand what's being spent. He draws an analogy to operating system design: protection layers, device layers, policy layers. The harness is the application. The control plane is the OS.
The practical difference matters when things go wrong. A harness can run an agent that does something destructive. A control plane puts a circuit breaker between the agent's intent and the execution — the same way an operating system prevents a userspace application from writing directly to disk without going through the kernel.
OpenClaw was the wake-up call
If the Meta experience showed what happens when agent adoption goes right, OpenClaw showed what happens when it goes wrong without a governance layer.
The incident validated Guild's core thesis. People gave agents open access to their local systems — broad permissions, no containment. Private keys ended up shared publicly. It was the kind of security failure that makes CIOs and CISOs rethink their entire approach to agent deployment.
Everingham frames Guild as something like an enterprise-grade version of what OpenClaw was trying to do: tool integrations, agents running, real work getting done — but inside what he calls a "bubble-wrapped container" that controls access and records everything. The container doesn't stop the agent from being useful. It stops the agent from being dangerous.
The lesson from OpenClaw isn't that agents are too risky to deploy. It's that deploying them without governance infrastructure is the risk.
Three companies, three different problems
One of the more grounded parts of the conversation is when the Guild CEO describes three real customer archetypes — not abstract personas, but actual patterns his team keeps seeing.
The company that doesn't know where to start
A small, non-technical company in the Midwest. No AI expertise in-house, no idea how to begin. The Guild team showed them the platform, and within minutes agents were operating in their infrastructure. The control plane lowered the barrier to entry because the governance, security, and integrations were already handled — the company didn't have to build any of that from scratch.
The large enterprise that already lost control
A big company where agents were already everywhere — but nobody had visibility into what they were doing or what they were costing.
One developer literally blew through our entire month's budget with an agent in seven hours and no one knew.
That's not a hypothetical. It's the kind of thing that happens when agent usage scales faster than the organization's ability to monitor it. A control plane puts a circuit breaker on runaway spend and gives finance and engineering leadership the same view of what's happening.
The regulated company that needs compliance baked in
Companies operating under GDPR, SOC 2, or similar frameworks can't just deploy agents and hope for the best. They need agents that check whether specs changed, walk the codebase, highlight what's affected, and write the necessary changes — with an audit trail that proves every step was compliant. The control plane makes that audit trail automatic rather than something a human has to manually reconstruct after the fact.
Models will commoditize like browsers did
When the conversation turns to the model landscape, Everingham pattern-matches to a history he lived through: the browser wars.
Netscape thought it was fighting a technology war. Microsoft turned it into a distribution war. The models today, he argues, are on the same trajectory — getting closer and closer to each other in raw capability, heading toward commodity status. But that doesn't mean there won't be great businesses. Microsoft was a shrink-wrap software company, then cloud services made it larger and more profitable than it had ever been.
The pattern he sees: less differentiation at the model layer, more value in the services and infrastructure built on top. If you're betting your company's AI strategy on one model provider staying permanently ahead of the pack, you're making the same mistake Netscape made. The winning bet is on the layer above — the governance, orchestration, and operational infrastructure that works regardless of which model is underneath.
Guild is building more than 60 agents on its own internal platform, with plans for a self-serve launch where companies can sign up and have agents running in minutes. The longer-term vision: tens of thousands of searchable agents, differentiated by an open-source community model. The control plane is model-agnostic by design.
How to get skeptical engineers to adopt AI
The final thread in the conversation is the one most engineering leaders are quietly struggling with: how do you get a team of smart, opinionated engineers to actually use AI tools when half of them think it's hype?
Everingham's answer is blunt: you don't mandate it.
Earned usage versus mandated usage is much more effective.
Mandates produce shallow adoption. Engineers are smart, they have tools they've spent years getting proficient with, and they don't respond well to being told to use something because leadership said so. Instead, he argues for putting order-of-magnitude challenges in front of the org — not "improve revenue 10%" (that's evolutionary thinking) but "10x revenue in six months" (that forces revolutionary thinking). Challenges big enough that AI becomes the natural tool to reach for.
Then celebrate the engineers who pull it off. Turn them into advocates. Let the adoption spread bottom-up, from people who actually built something that worked, rather than top-down from a slide deck.
He also shares his own practice for validating AI output:
If you can't explain it, you can't ship it.
Expert validation is non-negotiable. Use the tools aggressively — he names Claude Code and GPT Pro Research specifically — but never ship something you can't personally walk through and defend. The AI accelerates the work; it doesn't replace the judgment.
Your agents need *governance* — not just guardrails.
Guild is the control plane that gives your agents the same observability, security, and cost controls you already put around your workforce. If one runaway agent can blow your monthly budget in seven hours, it's time to stop treating agent management as optional.
Frequently asked questions
A harness orchestrates a model — it manages prompts, context, and execution. A control plane is the layer the harness runs inside. It validates each step, records what happened, enforces permissions, and tracks cost. The analogy is operating system design: the harness is the application, the control plane is the OS with its protection and policy layers.
People gave agents open access to local systems with broad permissions and no containment. Private keys ended up shared publicly. The incident became a wake-up call for CIOs and CISOs, validating the argument that agents need to run inside governed environments — not with unrestricted access to everything on the machine.
At a large enterprise customer, a single developer ran an agent without any spending controls or visibility layer in place. The agent consumed the company's entire monthly Anthropic budget in seven hours, and nobody was aware until after the fact. A control plane prevents this by putting circuit breakers on agent spend and surfacing cost data in real time.
No. Mandated usage produces shallow, grudging adoption. Earned usage — building tools that genuinely add value and letting engineers discover that value themselves — is far more effective. Put order-of-magnitude challenges in front of the org and let the people who succeed with AI become natural advocates.
Everingham sees models following the same trajectory as browsers in the late 1990s — raw capability converging, with the real differentiation happening in the services and infrastructure built on top. That doesn't mean model companies won't be profitable, but it does mean the long-term value accrues to the governance and orchestration layer above the models.
The principle is simple: if you can't explain it, you can't ship it. Use AI tools aggressively to accelerate the work, but always require expert validation before anything reaches production. The AI doesn't replace engineering judgment — it gives engineers more leverage to apply that judgment at scale.