AI Development Tools in 2026: Function Calls vs. Direct Search Integration for AI Agents
Perplexity's "Search as Code" announcement crystallized something developers building AI agents had been running into for months: search through function calls works in demos, but cracks under production pressure. Here's what the two approaches actually cost you — and what to reach for depending on your stack.
Disclosure: Felo builds multilingual search infrastructure and offers an API that fits the direct-integration pattern described here. We've tried to analyze both approaches honestly; you can weigh that context accordingly.
When Perplexity AI announced its "Search as Code" architecture for the Agent API in early 2026, the headline was that Python code could now call their search stack directly instead of routing through function calls. The buried lead was the reason: function-call-based search, the dominant pattern in AI development tools today, carries a set of failure modes that compound badly as agent complexity grows.
The search volume for "ai development tools" has hit 1,300 monthly searches with a keyword difficulty of just 25 — a signal that developers are actively evaluating their toolchain, not just reading overviews. This article focuses on one specific architectural decision those developers face: whether to wire search through function calls or integrate it directly into agent execution.
We've observed this split firsthand while building Felo's own agent infrastructure: early versions of Felo Search Agents used a function-call loop where the LLM decided when to query the search stack. Query quality was inconsistent across model versions, latency was harder to bound, and debugging required replaying full inference traces. Switching the core research step to a direct search call — with the LLM handling only synthesis — reduced p95 latency on multi-step research tasks by roughly 40% and eliminated an entire class of "model decided not to search" silent failures. The pattern described in this article reflects that experience, not just theoretical tradeoffs.The Standard Pattern: Search via Function Calls
Most AI agent frameworks — LangChain , LangGraph , CrewAI , AutoGen , the OpenAI Assistants API — treat search as a tool the LLM can call. The pattern looks roughly like this:
tools = [
{
"name": "web_search",
"description": "Search the web for current information",
"parameters": {"query": {"type": "string"}}
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
if response.choices[0].finish_reason == "tool_calls":
query = response.choices[0].message.tool_calls[0].function.arguments
search_result = run_search(query)
# feed result back into context, then call model again The model decides whether to search, what query to run, and when to stop. The developer defines the tool; the model controls when it fires.
This works. For simple agents — single-turn research, question answering with web grounding — it is the right default. The LLM's reasoning about when to search is often better than any heuristic a developer would hardcode.
The problems emerge when you push beyond single-step flows.
Where Function Calls Break Down
Latency compounds with agent depth
Each function call adds a round trip: model inference → tool execution → model inference again. A three-step research agent that calls search twice per step makes 6+ model inference calls before producing output. Inference cold-start overhead alone — typically 300–800ms per call for hosted models — accumulates into multi-second wait times that users notice.
The latency difference becomes more pronounced with streaming: in a function-call pipeline, the user sees nothing while the model decides to search, the search runs, and the model restarts synthesis. With direct integration, the agent can begin streaming synthesis immediately after a single, bounded search call.
Unpredictable search behavior
The model controls the search query. That means query quality varies with prompt quality, context window state, and model version. An agent that searched reliably on GPT-4o may behave differently after a model update, because the reasoning path that produced the search query changed. Developers on production agent systems consistently flag this as a debugging nightmare: the search infrastructure is fine; the model-generated query is the variable they cannot pin down.
LangSmith's production tracing documentation surfaces this explicitly: tool call arguments — including search queries generated by function-call agents — are among the most volatile elements in production traces, and the ones most likely to change unexpectedly after a model update or fine-tune.
Token costs that scale with loops
Function-call patterns feed search results back into context. For a multi-step research agent, context grows by several thousand tokens per search iteration — all re-encoded on each pass. The cost structure is hard to predict upfront, and agents that loop more than planned generate costs that compound fast. Perplexity's announcement explicitly cited this as a motivation for the Search as Code design: eliminating the inference loop reduces token spend on search-heavy workflows.
Silent failure modes
When a function-call-based agent fails to get useful search results, it often doesn't surface an error — it continues generating with whatever context it has. A 2025 analysis by Arize AI found that in complex multi-step pipelines, a meaningful share of agent-driven tool calls return success signals without actually executing — a silent failure mode nearly impossible to catch without purpose-built tracing. LangSmith and Arize Phoenix emerged as the go-to observability layers for LangChain-based projects precisely because the frameworks offer no production-grade logging out of the box.
The Alternative: Direct Search Integration
"Search as Code" — Perplexity's framing for their Agent API — describes a pattern where search is a first-class primitive in the agent's execution graph, not a capability the LLM decides to invoke:
from perplexity import AgentClient
client = AgentClient(api_key="...")
results = client.search(
query="function calls vs direct API integration latency benchmarks 2026",
mode="research"
)
summary = synthesize(results, context=task_context) The difference is control plane ownership. In the function-call pattern, the LLM owns the decision to search. In direct integration, the developer owns it. The LLM only sees the results.
This shift has concrete effects:
Latency: One inference pass instead of two (or more). The search call is synchronous; results arrive before the synthesis call. For a research pipeline with three sequential search steps, that's three fewer inference round-trips — typically 1–3 seconds of wall-clock latency removed without any other optimization.
Predictability: The query is constructed by code, not generated by a model. It doesn't drift with model updates or context window pressure. You can test it with unit tests; you can pin it with version control.
Cost: No round-trip inference overhead for the search decision. Token usage is bounded by the synthesis step, not by a loop that could iterate unpredictably. For high-frequency agent pipelines, this difference compounds — Anthropic's usage guidance notes that tool-use patterns with repeated round-trips are among the most token-intensive patterns in the Claude API.
Debuggability: Search calls are regular function calls with return values. Standard logging, error handling, and retry logic apply without specialized observability tooling. When something breaks, the stack trace points to a line of your code — not to an opaque model decision.
When Function Calls Are Still the Right Choice
Direct search integration trades flexibility for control. There are real cases where that's the wrong trade.
When the agent needs to decide whether to search at all. If your agent handles a wide range of tasks — some that need web grounding, some that don't — the LLM's judgment about whether to invoke search is genuinely useful. Hardcoding a search call for every agent turn wastes tokens and adds noise.
When query formulation requires reasoning. Some research tasks benefit from a model-generated search query that synthesizes context the developer can't anticipate at write time. A general-purpose research agent asked to "investigate the regulatory landscape around AI-generated content in the EU" will produce a better query than most hardcoded alternatives.
When tool composition is the feature. Developer frameworks like LangGraph and CrewAI are built around the idea that the model orchestrates tool usage. If your agent is multi-tool — search, code execution, file I/O, API calls — keeping search as a function call lets the model orchestrate all of them through the same mechanism. Switching search to direct integration while keeping everything else as tools creates an asymmetry that complicates agent design.
The rule of thumb: use function calls when the model's reasoning about when and how to search is part of the value. Use direct integration when search is infrastructure — a predictable, repeated operation the developer controls.
Multilingual Search as a Practical Constraint
One dimension that rarely appears in architecture discussions but matters significantly in production: search quality across languages.
Most AI development tools assume English-primary search. The LLM generates English queries; the search API returns English results; the model synthesizes in English. This works until your agent needs to serve non-English users or access non-English sources.
Felo's API (openapi.felo.ai) exposes multilingual search across 19+ languages as a direct call — meaning it fits the direct integration pattern and handles language routing internally. An agent targeting Japanese, Korean, or Chinese users can call Felo's search with a localized query and receive translated, synthesized results without building language-detection and routing logic into the agent itself.
This matters more than it sounds. A research agent serving Japanese users that routes through English-language search misses a significant share of the relevant source material — particularly for local news, regulatory filings, and market-specific content that is primarily published in Japanese. Wiring multilingual search as infrastructure rather than as a function call the model controls makes language handling consistent and testable.
Felo's Search Agents implement this pattern at the product level — agents that handle multi-step research across languages without developer configuration. For teams building their own agent pipelines with multilingual requirements, the underlying API provides the same capability as a direct integration.
Choosing Your AI Development Tools: A Decision Framework
The function calls vs. direct search integration question is one instance of a broader architectural choice in AI development: where does model reasoning stop and developer code begin?
Three questions narrow the answer for search specifically:
Question | Function Calls | Direct Integration
Does the agent need to decide whether to search at all? | Yes | No — search always fires
Is search one of many tools the model orchestrates? | Yes | No — search is isolated infrastructure
Do you need consistent, testable, cost-predictable search behavior? | Partial | Yes
A practical starting point for most production agent builds:
- Start with function calls during prototyping. The flexibility is worth more than the predictability when you're still learning how your agent behaves.
- Profile your production trace once you have a working agent. Look at how often the model calls search, what queries it generates, and where latency accumulates. The data will tell you whether direct integration is worth the refactor.
- Migrate search to direct integration for any step where search is clearly always required — initial research gathering, source verification, real-time data grounding — and keep function calls for steps where search is optional.
For agent memory across sessions — a related problem that gets less attention than search — MemClaw provides persistent project context that works across Claude Code, Gemini CLI, OpenClaw, and Codex. The pattern is similar: persistent memory as direct infrastructure rather than as something the model decides to use.
Frequently Asked Questions
What is the difference between function calls and direct search integration in AI agents?
Function calls route search requests through an LLM's tool-use mechanism — the model decides when to call a search function, the runtime executes it, and results feed back into context. Direct search integration embeds search as a first-class primitive in the agent's execution graph, so developer code controls when and how search runs. Direct integration is faster, more predictable, and easier to debug; function calls are more flexible when the model's judgment about whether and how to search is genuinely valuable.
What is "Search as Code" and why does it matter?
"Search as Code" is the architecture Perplexity AI used to describe their Agent API — where Python code calls their search stack directly, bypassing the function-call loop. It matters because it shifts search from a model-controlled behavior to a developer-controlled one, which improves latency, cost predictability, and debuggability in production agent systems. It's a sign that search infrastructure for AI agents is maturing from a demo-friendly pattern to a production-grade one.
Which AI development tools support direct search integration?
Perplexity's Agent API is the most prominent example of direct search integration as of 2026. Felo exposes multilingual search through its API (openapi.felo.ai) with support for 19+ languages, which can be called programmatically from agent pipelines. For teams building agents that need to serve non-English markets, direct integration with a multilingual search API is the more reliable pattern than asking an LLM to generate language-appropriate queries.
What This Means for Your Stack
The "Search as Code" announcement from Perplexity was a signal, not just a feature release. It reflected where production AI development is heading: less model-driven improvisation, more developer-controlled infrastructure. Function calls remain the right pattern for genuinely flexible, multi-tool orchestration. Direct integration is the right pattern for search that is always required, always should behave consistently, and should be monitored and optimized like any other API dependency.
Most production agent systems will end up with both. The agent scaffolding uses function calls for orchestration; the hot path — research gathering, source retrieval, real-time grounding — uses direct search integration. The boundary between them is the decision developers are now actively making.
If your agent pipeline includes multilingual search requirements, Felo's AI search and its API provide direct integration across 19+ languages without building language routing yourself. For persistent memory across agent sessions, the MemClaw plugin works with Claude Code, Gemini CLI, and Codex — consistent context as infrastructure, not as something the model manages.