DevMate was a managed framework for building, sharing, and running AI agents across Meta's engineering organization. It evolved from an earlier, loosely architected system called Confucius. Engineers could browse available agents, fork them, extend them, and deploy them — agents were invoked not just by humans but by tools like CI/CD systems. By Meta's most recent earnings report, over 50% of Meta's code was being written by agentic frameworks managed through this system.

DevMate Went Viral Across 40,000 Engineers — Here's What It Taught Guild's CEO About Agent Infrastructure
Forty thousand engineers, sixteen thousand internal tools, and a source control system so large that nothing off-the-shelf could handle it. That was the environment where James Everingham first saw agentic AI go from interesting experiment to organizational wildfire — and where one AI agent accidentally outed itself by pushing code to a public repository with its own name in the comments.
In a fireside chat at Theory Ventures Office Hours, hosted by Tomasz Tunguz, the Guild.ai CEO told the most detailed version of the DevMate origin story published anywhere. Not the sanitized press version. The version where a loose framework called Confucius caught fire before anyone planned for it, where engineers became internal celebrities for their agents, and where the team realized that the real product wasn't the agents themselves — it was the infrastructure to manage them.
Tunguz, founder and general partner at Theory Ventures (an early-stage VC firm managing roughly $800 million, with half the team being AI engineers), pushed Everingham on defensibility, on whether junior engineers are losing foundational skills, and on what founders should actually do with these tools right now. The answers cut against several popular narratives.
"You had to earn the usage"
Meta's developer infrastructure org was roughly a thousand engineers building every tool that 40,000 to 45,000 internal developers used daily. When Everingham came in to run the group, the mandate was familiar: increase developer productivity with AI. The first instinct was familiar too — build an autocomplete tool and measure the results.
They built Code Compose, an internal VS Code environment with Cursor-like autocomplete capabilities. It got results. But the results were okay, not transformative. Nobody was seeing 10x.
The harder problem was skepticism. Senior engineers doing complex, large-scale work weren't impressed by autocomplete suggestions. Junior engineers embraced the tools naturally, but the people writing kernels and working at Meta-scale infrastructure had good reasons to be cautious — vibe coding a kernel running at that scale could be a very costly mistake.
Tool usage needs to be earned, not mandated.
So the team tried something different. Instead of telling engineers to use AI, they put business challenges out. Can you build self-healing fabric? Can you eliminate code freeze during the holidays? Can you optimize compiler output using an LLM? The challenges forced engineers to think differently about where AI could actually apply, rather than treating it as a general productivity overlay.
The approach worked because of a measurement advantage most companies don't have. Meta owned its entire toolchain — every piece of it, custom-built because nothing else could handle billions of lines of code. That ownership meant they could track code provenance down to the character level. When other CEOs were guessing that some percentage of their code was AI-written, Everingham's team had real data. They knew exactly what was written by code, what was influenced by AI, and what was purely human.
"DevMate even leaked itself"
The agentic wave inside Meta didn't start with a top-down initiative. It started organically, in the way things tend to start inside a company that deliberately chose the problems of too little bureaucracy over the problems of too much.
The first framework was called Confucius. It wasn't architected for scale — it was more of a communal surface where engineers could share agents they'd built. Someone would create an agent for documentation, put it into Confucius, and others could find it, run it, copy it, extend it. Alongside this, engineers had an internal prompt-sharing system where they could learn by seeing how colleagues were prompting — directly inspired, as Everingham told Tunguz, by the way Discord's open channels taught Midjourney users to get better results by watching each other work.
Confucius caught fire. The team decided to formalize it. DevMate was the result — a framework with better architecture for scale, centralized management, and deeper integration with Meta's infrastructure. Engineers could browse available agents, see what impact they were having, fork them, and build on top of them.
The critical design decision: agents weren't just invoked by humans. They were invoked by tools. An agent could fire in the middle of a pull request from the CI/CD system, or trigger when someone checked in code, or just run autonomously across the codebase in the background. The team called those autonomous agents "Roombas" — sweeping through the code doing maintenance work without anyone pressing a button.
Our damn AI agent even leaked itself by making a modification to something that was in an open-source component.
DevMate pushed code to an external repo with a comment identifying itself. The press picked it up. It was embarrassing and, as the Guild CEO put it, kind of funny. But it was also proof that the system was running at a scale where agents were operating with real autonomy.
The social dynamics mattered as much as the technical ones. Engineers who built effective agents became local celebrities. They made elaborate videos showing off their work, posted them to internal groups, and built reputations. Meta's internal culture — a version of Facebook used as the corporate intranet — amplified this. Agent-building became a rewarded, visible activity, not a side project.
By the most recent Meta earnings report, the numbers tell the scale story: over 50% of Meta's code is now written by agentic frameworks, with 200 to 300 agents running throughout the infrastructure — all managed through the system DevMate created.
"You can't have one code review agent"
Tunguz asked a straightforward question: how many agents per software engineer should an enterprise expect? Everingham's answer has shifted dramatically, and recently.
Six months ago, he would have said dozens. Now he thinks thousands — and even that might be an embarrassingly small number. He compared it to the mainframe-to-microservices transition. Early computing started with one mainframe that everyone shared, then moved to racks of servers running specialized services. Agents are following the same arc.
The code review problem at Meta is the clearest illustration. Meta's codebase is unimaginably large. You can't fit it into a context window. The code spans dozens of specialized domains — compliance, graphics, esoteric internal languages like Hack (which has no meaningful public training data, requiring fine-tuned models). One general-purpose code review agent can't handle that breadth.
What one code review agent can't solve that. You need hundreds of different code review agents with different specialties, different models.
The implication extends beyond code review. Any sufficiently complex workflow — security auditing, testing, deployment — will fragment into many specialized agents rather than consolidating into one powerful one. The context window limitation is real, but it's not the primary reason. Debugability and troubleshooting matter more. When a monolithic agent fails, finding the failure is hard. When a specialized agent fails, the blast radius is contained and the diagnosis is obvious.
Agents can also collaborate. Everingham described a pattern where a code review agent doesn't know about a particular repository, so it asks an onboarding agent to brief it — the same way a new engineer would ask a colleague for context before reviewing unfamiliar code. That kind of multi-agent coordination is where the architecture is heading, and it's why the number of agents per organization will keep climbing.
"Community is more defensible than technology"
When Tunguz pushed on the moat question — what keeps someone from just building the same thing — Everingham gave an answer that surprised a VC audience more used to hearing about proprietary models or data flywheels.
Community is more defensible than technology.
The defensibility thesis for Guild isn't a proprietary model or a locked-in data layer. It starts with being vendor-neutral: bring your own keys, bring your own model. That positioning provides defensibility against single-vendor lock-in, which Everingham sees as a real concern for enterprises evaluating agent platforms. Companies don't want to bet their infrastructure on one model provider when the landscape changes every quarter.
On top of the control plane, Guild is building a managed software center for agents — a surface where engineers inside a company can browse, fork, and extend agents (the DevMate model), plus a public layer where anyone can publish agents for others to discover and pull into their own infrastructure. The analogy is explicitly GitHub: a community marketplace where the agents themselves are the repositories.
The conversation also touched on data platform risk. Tunguz raised the example of platforms like Slack rate-limiting their APIs as they realize the data flowing through them has become strategically valuable. Everingham acknowledged the problem directly — governance needs to control what data agents can access, what's communicated back to LLMs, and what shouldn't leave the organization. That's not a theoretical concern. It's one of the reasons centralized agent governance exists as a product category.
"Before you make the hire, find the tool"
Tunguz asked what advice Everingham would give founders and heads of engineering trying to drive AI adoption. The answer was blunt and practical, aimed squarely at the operator mindset.
Before you make the hire, find the tool.
The specific example: you're about to hire an initial designer to build mockups that will translate to code. Before posting the job, try Figma-to-code tools. Use GPT Pro to help build a product plan. Get scrappy with AI tools first, and only hire when you've confirmed the tool can't do it.
The second piece of advice was about persistence. AI tools are changing so fast that evaluations have a short shelf life. Something that didn't work six months ago might be genuinely capable now. The people most at risk aren't the ones who tried AI and found it lacking — they're the ones who tried it once, dismissed it, and stopped checking back.
Everingham connected this to the Jevons Paradox: making engineers more efficient doesn't mean you need fewer of them. Historically, making a resource more efficient increases total demand for that resource. Compute got cheaper; we used astronomically more of it. If AI makes each engineer dramatically more productive, the economic incentive is to hire more engineers and capture even more output — not to cut headcount.
The "lost generation" fear — that junior engineers are losing foundational skills because AI handles the basics — got a counterintuitive response. At Meta, the opposite happened. Juniors embraced AI tools while seniors resisted. The nature of what a junior engineer is will change, the former Meta Dev Infra leader argued. They'll need to understand the fundamentals and be able to troubleshoot, but their core competency will shift toward prompting effectively and debugging agent outputs quickly, not writing every line by hand.
"I described what I wanted and the system configured itself"
The conversation closed on a question about human cognitive limits. If thousands of agents are running in your infrastructure, how does anyone keep track of what they're doing? The answer is that you don't — and you shouldn't have to.
Everingham told a story from Guild's own internal use. He went into the system's prompt and typed: "I want to make it so that duplicate bugs can't exist." The system found the right agent — an issue deduplication agent — installed it, set it up, hooked it into Git issues, and it was running. He didn't configure anything. He didn't browse an agent catalog. He described an outcome and the infrastructure handled the rest.
I described what I wanted and the system configured itself to satisfy my need.
Tunguz named it: intent-based programming. You describe what you want, not how to achieve it, and the system configures itself to satisfy the need. It's the same trajectory that took computing from assembly to high-level languages to natural language — each step removing a layer of manual specification.
The parallel to Everingham's early career at Borland is hard to miss. He opened the conversation describing the first time he used Turbo Pascal — a compiler that wrote tighter code than his hand-optimized assembly, with visual debugging that didn't exist anywhere else. That was the moment, he said, when developer tools felt like magic. The current AI moment has the same quality: the tools are doing things that feel impossible until you see them work, and then you can't imagine going back.
The difference is scale. Turbo Pascal made one developer more productive. An agent control plane with thousands of specialized agents, a community marketplace, and intent-based orchestration makes an entire engineering organization operate differently. The infrastructure that enables that shift — the versioning, governance, observability, and cost controls — is what Everingham built DevMate to be at Meta, and what Guild is building for everyone else.
Your agents need *infrastructure* — not just a harness.
Guild is the agent control plane built by the team that scaled DevMate across 40,000 engineers at Meta. Governance, observability, cost controls, and a community marketplace for agents — the infrastructure layer enterprises need before the next agent leaks itself.
Frequently asked questions
Over 50% of Meta's code is now written by agentic frameworks, with 200 to 300 agents running in the infrastructure. Meta had a measurement advantage most companies lack — because they owned the entire toolchain, they could track code provenance down to the character level rather than relying on rough estimates.
The code review problem at Meta illustrates this clearly. The codebase spans compliance, graphics, specialized languages like Hack, and countless other domains. No single agent can cover all of that effectively. The pattern mirrors the mainframe-to-microservices shift — specialized agents are easier to debug, cheaper to fine-tune, and more reliable within their domain than monolithic alternatives.
It's an infrastructure layer for managing enterprise agents at scale — versioning, rollback, release management, observability, security access controls, governance, compliance, and cost management. One customer had an engineer burn through their entire monthly budget in twelve hours with a single agent and no visibility into the spend. The control plane puts a circuit breaker on that kind of failure. Check out Guild.ai for more details.
Instead of configuring infrastructure manually, you describe the outcome you want in natural language. When Everingham typed "I want to make it so that duplicate bugs can't exist" into Guild's system, it found the right agent, installed it, connected it to Git issues, and started running — no manual configuration required. The system figured out the workflow from the intent.
The Jevons Paradox suggests the opposite. Historically, making a resource more efficient increases total demand for it. If AI makes each engineer dramatically more productive, the economic incentive is to hire more and capture the increased output. The nature of engineering work will change — junior engineers may shift toward prompting and debugging agent outputs — but the role itself isn't disappearing.
Build the muscle of trying AI tools before adding headcount. If you were going to hire a designer, try Figma-to-code first. If you tried a tool six months ago and it didn't work, try it again — the capabilities are changing fast enough that monthly re-evaluation is warranted. Tool usage needs to be earned, not mandated, so demonstrate value rather than issuing directives.
Instead of mandating tool usage, the team put specific business challenges out — build self-healing fabric, eliminate code freeze during holidays, optimize compiler output with LLMs. The challenges forced engineers to discover where AI actually applied. DevMate's public surface let skeptics see real impact metrics from other engineers' agents, which built trust organically rather than through top-down pressure.