Tool Use (AI Agents)

Guild.ai team

Feb 23, 2026

5 min read

Article Index

Key Takeaways
What Is Tool Use (AI Agents)?
How Tool Use Works
Why Tool Use Matters
Tool Use in Practice
Key Considerations
The Future We're Building at Guild

Key Takeaways

Tool use is the mechanism that transforms LLMs from text generators into agents that act on real systems — calling APIs, querying databases, executing code, and modifying external state.
The core loop is Reason → Select → Call → Observe: the model interprets a request, selects the right tool from defined schemas, executes it via structured JSON output, and incorporates the result into its next decision.
Tool definitions consume tokens and directly affect cost, latency, and accuracy — Anthropic's internal testing showed 58 tools can consume ~55k tokens in a single prompt.
The Model Context Protocol (MCP) is emerging as the universal standard for agent-tool connectivity, adopted by OpenAI, Google DeepMind, and Microsoft, and now governed by the Linux Foundation.
Security is the critical constraint: tool-calling agents inherit LLM vulnerabilities (prompt injection, hallucination) while adding new attack surfaces through autonomous access to production systems.
Every tool invocation should be treated as a high-risk event requiring scoped permissions, audit logging, and runtime monitoring.

What Is Tool Use (AI Agents)?

Tool use in AI agents is the capability that allows a language model to invoke external functions, APIs, databases, or services to complete tasks it cannot perform with its training data alone. It is the bridge between reasoning and action — the mechanism that turns a chatbot into an agent.

This capability is what transforms a basic LLM from a text generator into a powerful agent that can interact with the real world. Without tool use, an LLM can only generate text based on patterns learned during training. It cannot check a deployment status, query a database, send a Slack message, or create a Jira ticket. LLMs cannot perform real-world actions, so tools provide capabilities like coding, browsing, calculations, and file operations.

Think of tool use like giving an engineer access to their terminal. The engineer's knowledge matters, but without the ability to run commands, push commits, or query logs, they are limited to talking about code instead of shipping it. Tool use gives agents the same progression: from describing actions to executing them.

As Hugging Face's Agents Course explains, a tool is fundamentally a function given to the LLM, where each function serves a clear objective. Function calling (also known as tool calling) provides a powerful and flexible way for OpenAI models to interface with external systems and access data outside their training data.

How Tool Use Works

Tool use follows a structured loop that repeats until the agent has enough information to respond. As described in the Prompt Engineering Guide, LLM-based agents rely on two key capabilities to solve complex tasks: tool calling and reasoning.

Tool Definitions

Every tool is described to the model via a structured schema — typically JSON — that includes the tool's name, description, and parameters. Tool definitions are arguably the most critical component of function calling. They are the only way the LLM knows what tools are available and when to use them. A poorly written description leads to wrong tool selection. A vague parameter schema leads to incorrect arguments.

For example, a deployment-status tool might include a function name like `get_deployment_status`, a description stating "Returns the current deployment status for a given service in a given environment," and parameters for `service_name` and `environment`. The model uses this schema to decide when and how to call the tool.

The Agent Loop

The loop consists of repeated cycles of: Action: The agent decides to take an action (call a tool). Environment Response: The external tool or API returns a result. Observation: The agent receives and processes the result. Decision: The agent decides whether to take another action or respond to the user.

In a ReAct-style pattern, as Google Cloud documents, the ReAct pattern is an approach that uses the AI model to frame its thought processes and actions as a sequence of natural language interactions. In this pattern, the agent operates in an iterative loop of thought, action, and observation until an exit condition is met.

Tool Execution and Context Management

Under the hood, functions are injected into the system message in a syntax the model has been trained on. This means functions count against the model's context limit and are billed as input tokens. This has direct cost implications. As Composio's engineering guide reports, internal testing by Anthropic showed that 58 tools could consume ~55k tokens. As the number of tool options increases, the model's ability to select the correct one decreases.

OpenAI's own documentation recommends aiming for fewer than 20 functions at any one time. Beyond that, accuracy degrades and costs spike.

The Role of MCP

The Model Context Protocol (MCP) is an open standard and open-source framework introduced by Anthropic in November 2024 to standardize the way AI systems like large language models integrate and share data with external tools, systems, and data sources. In December 2025, Anthropic donated the MCP to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation.

Before MCP, every tool integration was custom-built per model provider. Developers often had to build custom connectors for each data source or tool, resulting in what Anthropic described as an "N×M" data integration problem. MCP standardizes this into a single protocol, as described on the MCP official site — much like USB-C standardized device connectivity.

Why Tool Use Matters

Tool use is not a nice-to-have feature. It is the architectural decision that separates a prompt wrapper from a production agent.

From Text to Action

The shift from "chatbots" to "AI agents" hinges on a single technical capability: Tool Calling. An incident response agent that can only summarize log patterns is useful for context. One that can also query PagerDuty, check recent deployments in GitHub, and open a Jira ticket is useful for resolution. The difference is tool use.

Real Engineering Impact

Consider an IT support agent. An agent handling IT support might need access to Jira, GitHub, PagerDuty, Slack, and AWS. Without tool use, every step requires a human to copy-paste between systems. With tool use, the agent chains reads and writes across those systems in a single workflow — collecting the on-call schedule, correlating it with the alert timeline, and drafting a summary for the incident channel.

In software development, function calling enabled powerful coding assistants that could interact with development tools. These assistants could search documentation, execute code, test functions, read files, and perform other development tasks. They could go beyond just generating code suggestions to actually running tests, checking results, and iterating based on feedback.

Tool Use in Practice

CI/CD Pipeline Automation

A deployment agent receives a webhook from a merged PR. It calls a `run_tests` tool to execute the test suite, checks results via a `get_test_results` tool, and if all pass, triggers `deploy_to_staging`. If tests fail, it calls `create_jira_ticket` with the failure details and notifies the owning team via `send_slack_message`. Each step is a discrete tool call, chained by the agent's reasoning.

Customer Service with Multi-Tool Orchestration

In customer service scenarios, an agent can retrieve customer data, amend orders, or prepare a refund. The agent recognizes the intent, selects the right tool (order lookup vs. refund processing vs. address update), calls it with the correct parameters, and incorporates the result into the conversation — all without hard-coded branching logic.

Document Processing and Code Review

In document-driven processes an agent can analyse files, extract relevant parts, and generate draft documents, with tools providing access to document storage, OCR, and workflow systems. A code review agent might call a `get_diff` tool on a pull request, run a `static_analysis` tool against changed files, query a `style_guide` knowledge base, and post structured review comments — each action a separate tool invocation orchestrated by the model.

Key Considerations

Tool use is powerful precisely because it gives agents write access to real systems. That power demands engineering discipline.

Security: The Expanded Attack Surface

Agentic applications inherit the vulnerabilities of both LLMs and external tools while expanding the attack surface through complex workflows, autonomous decision-making and dynamic tool invocation. As Palo Alto Networks' Unit 42 research documents, this can escalate from information leakage to full infrastructure takeover.

In April 2025, security researchers released an analysis that concluded there are multiple outstanding security issues with MCP, including prompt injection, tool permissions that allow for combining tools to exfiltrate data, and lookalike tools that can silently replace trusted ones.

Permission Scoping and Least Privilege

An agent given access to production environments can cause real damage — deleting files, modifying databases, or executing transactions that are difficult or impossible to reverse. As IAPP's analysis emphasizes, use scoped API keys that only grant the specific required permissions — for example, read-only database credentials rather than full administrative access. An agent designed to check deployment status should never hold credentials to roll back production.

Tool Selection Accuracy Degrades at Scale

Aim for fewer than 20 functions at any one time. Past that threshold, models increasingly pick wrong tools or hallucinate parameters. This is not a theoretical concern — it is a latency, cost, and reliability problem that compounds in multi-step workflows. Patterns like dynamic tool discovery and on-demand loading, as Anthropic's engineering team describes, help address this.

Observability and Audit Trails

Evaluation looks not only at the content of answers but also at tool usage, the order of steps, and adherence to predefined policies. Every tool call should be logged with its inputs, outputs, latency, and the reasoning chain that triggered it. Without this, debugging a misbehaving agent is like debugging a distributed system with no tracing — possible, but painful.

Error Handling Is Non-Negotiable

Since tool calling requires correct tool inputs from your AI agent as well as a functioning 3rd-party API, errors can and will happen. There are generally two types of errors: Tool inputs error — your AI agent uses the right tool, but uses the wrong inputs. Tool execution error — your AI agent uses the right tool, and the right inputs, but the underlying function/request failed due to API errors. Production agents need retry logic, fallback behavior, and clear error propagation. A tool call that silently fails is worse than one that visibly errors — the agent may proceed with stale or missing data.

The Future We're Building at Guild

Tool use is what makes agents useful. But ungoverned tool use — agents calling APIs without scoped permissions, audit trails, or cost controls — is what makes agents dangerous. Guild.ai provides the runtime and control plane that treats every tool invocation as a first-class, observable, permissioned event. Because agents that can act on production systems need infrastructure that ensures they act correctly.

Join the Guild.ai waitlist to see how we're building governed agent infrastructure for engineering teams.

Where builders shape the world's intelligence. Together.

The future of software won't be written by one company. It'll be built by all of us. Our mission: make building with AI as collaborative as open source.

Join The Waiting List

FAQs

They refer to the same capability. "Function calling" was the original term introduced by OpenAI in June 2023; "tool use" or "tool calling" is the broader, vendor-neutral term now used across the industry. Both describe the mechanism by which an LLM outputs structured data to invoke an external function.

Aim for fewer than 20 functions at any one time. Beyond that, selection accuracy drops and token costs increase. For agents requiring access to 50+ tools, dynamic tool discovery patterns — loading only relevant tool definitions per task — are essential.

MCP defines a standardized framework for integrating AI systems with external data sources and tools. It replaces custom per-provider integrations with a universal protocol, adopted by OpenAI, Google DeepMind, and Microsoft. Think of it as the standard interface layer between an agent and the tools it calls.

The top three risks according to the OWASP ASI are memory poisoning, tool misuse (agents being tricked into abusing system access), and privilege compromise (agents exploited to escalate access). Prompt injection attacks can manipulate agents into calling tools with malicious parameters, making runtime monitoring and scoped permissions essential.

Technically, any software can call an API. What makes LLM-based tool use distinct is that the model decides which tool to call, with what parameters, based on natural language reasoning rather than hard-coded logic. This flexibility is the value — and the risk.

Quality assurance for agent-based systems requires a combination of classical software testing methods and new evaluation techniques for LLM behaviour. Unit and integration tests are supplemented with scenario-based evaluations, where the agent is confronted with realistic dialogues and edge cases. Test tool selection accuracy, parameter correctness, error handling, and multi-step chaining independently before deploying to production.