ProductJun 26, 202610 min read

Two Questions Every Agent Platform Eventually Has to Answer

Cory Waddingham

Article Index

Usage: spend you can actually attribute
Cache-aware costing
How the Usage pipeline works
Audit Logs: who changed what
What Insights is not
Availability

There's a moment in every team's agent journey when the novelty wears off and the operational questions start. Usually it begins with someone from finance. The monthly LLM bill arrives, it's bigger than last month, and nobody can say why. Which workspace drove it? Which agent? Was it a model change, a traffic spike, or that one integration somebody wired up on a Friday?

Then comes the second question, often from security: who connected that credential? When? And what else have they changed lately?

These are the two questions Insights exists to answer. What did we spend, and who changed what. It's a new section in Guild for organization admins, and it has exactly two tabs: Usage and Audit Logs.

Usage: spend you can actually attribute

The Usage tab is a cost and consumption dashboard, but the interesting part is the attribution model. Most LLM cost tooling stops at "tokens per API key." That tells you almost nothing in an agent platform, where one key might serve a dozen workspaces and a hundred agents. Insights attributes every dollar to the workspace, agent, model, and provider that incurred it.

The dashboard opens on a 30-day window (7 and 90 are a click away) with a KPI strip across the top: total spend in USD, total tokens, distinct sessions, cache hit rate, output ratio, and a comparison against the previous equal-length period. That last one matters more than it sounds. "Spend is up 40% versus the prior 30 days" is a sentence you can take into a budget meeting. A raw token count isn't.

Below the KPIs sits a time-series chart you can toggle between tokens and spend, and four breakdown tables: workspaces, agents, providers, and models. Each row shows spend, tokens, sessions, and percentage of org total. Click a workspace or agent and you drill into a scoped view of just that slice, filters intact, with a breadcrumb back to the org-wide picture. Finding your most expensive agent takes two clicks.

Cache-aware costing

One detail worth double clicking on: the spend math respects prompt caching.

If you've looked closely at an Anthropic or OpenAI invoice, you know cached tokens are billed differently from fresh input. A lot of internal dashboards ignore this and just multiply total input tokens by the list rate, which overstates spend for any workload with decent cache hit rates. Insights computes billable input per LLM call as input tokens minus cache reads, then prices each token type at its own rate:

billable_input = max(input_tokens - cache_read_tokens, 0)

spend = billable_input × input_rate

+ output_tokens × output_rate

+ cache_read_tokens × cache_read_rate

+ cache_write_tokens × cache_write_rate

The Models table makes this transparent. An info icon on each model shows the per-million-token rates and how the spend splits across input, output, cache reads, and cache writes. If your cache rate KPI is high and your spend is lower than naive math would suggest, you can see exactly why.

One honest caveat: these figures use Guild's list pricing for each model which might be different from your contract rate. They're observability, not your invoice. Per-org pricing overrides are on the roadmap.

How the Usage pipeline works

Under the hood, this is a write-path and read-path story.

The write path. Every LLM call that flows through Guild's proxy logs its usage asynchronously. The runtime fires a Celery task that records a row per call in Postgres: provider, model, input tokens, output tokens, cache reads, cache writes, plus timing metadata. A daily per-account rollup updates alongside it, which also powers quota checks and token balance accounting. The hot path never waits on analytics bookkeeping.

The read path. Knowing what you spent should never slow down the system that's spending it. So reporting gets its own path: the Usage dashboard never touches production Postgres. Datastream replicates the tables into BigQuery, and the dashboard queries the replica, where 90 days of aggregation across every task, session, and agent can run without competing with live traffic. A Redis cache in front (five-minute TTL on summaries) absorbs most of the exploration, so toggling filters and drilling into workspaces rarely reaches BigQuery at all.

A few smaller choices add up to numbers you can trust:

Spend lands when it's incurred. Windows are computed on the timestamp of the LLM call itself, not when the task was created. A long-running session that spans a month boundary gets split correctly.
Tool tasks count. Agent tasks aren't the only thing making LLM calls; tool invocations do too. The queries union both, so integration-driven usage doesn't silently vanish from the totals.
Half-open windows. Time ranges are [start, end), so a call landing exactly on a boundary is counted once, in one period.
Draft usage attributes correctly. Usage from ephemeral agent versions (testing an agent before publishing) rolls up to the same agent as its committed versions, so iteration cost is visible alongside production cost.

None of these is glamorous. All of them are the difference between a dashboard people trust and one they argue with.

Audit Logs: who changed what

The second tab answers the security question. Every mutating API call against the platform (POST, PATCH, PUT, DELETE) that succeeds gets recorded: who made it, what they hit, from where, and what was in the request body.

The mechanism is deliberately boring. A single after-request hook in the API layer inspects every response. If the method mutates state, the request was authenticated, and the response wasn't a client error, an audit row is written. There's no per-route instrumentation to forget, which is the failure mode of most homegrown audit systems. Coverage comes from the middleware, not from developer discipline.

Three properties matter for anyone evaluating this as a compliance surface:

Sensitive fields are redacted before persistence. Request bodies are stored, but any key whose name suggests a secret (token, password, key, auth, and friends) is replaced with *** before the row is written. The redacted value never touches disk. Bodies are also depth- and size-limited, so a pathological payload can't bloat the log.

The log is append-only. At the schema level, updates and deletes against audit rows are denied for everyone. Reads are limited to org admins and the actor themselves. An audit trail you can edit is a liability with extra steps.

Events are human-readable. Rather than maintaining a hand-curated label for every route, descriptions are derived from REST conventions: POST /api/credentials/api-key becomes "Connected credential," a publish endpoint becomes "Published version." The viewer shows these labels with method badges, actor, and relative time, and each row expands to the full detail: path, status code, IP, user agent, and the redacted body.

The search box is built for investigation rather than exact-match grep. It matches across paths, view names, methods, actors, and body text, and it expands verbs: searching "disabled" finds disable-shaped endpoints even if no field literally contains that word. Journey-wise, "who connected that credential" is a search, an expand, and a timestamp.

One scoping note: reads aren't logged. The audit trail is mutation-focused by design. It tells you who changed the system, not who looked at it.

What Insights is not

Two clarifications, because adjacent surfaces exist.

Guild has an older usage page under Settings, backed by live Postgres with short time windows and rough cost estimates. Insights Usage is the successor: longer windows, real per-model pricing, cache-aware math, and drill-ins. The Settings page still exists today.

And Insights is the human-facing surface. The same underlying usage data is available programmatically through Guild's API, which means agents themselves (including The Smith) can inspect spend and audit history as part of governance workflows. That's a different post.

Availability

Audit Logs are live now for all organization admins. The Usage tab is rolling out in early access. Both are admin-only by design: cost data and audit trails are operator concerns, and members don't see the Insights section at all. CSV export for audit logs is visible in the UI but not yet enabled; it's coming.

Running agents at scale means eventually answering to finance and to security. Insights is where both conversations start.