🎧

Movement 1 of 4 Chapter 11 of 42 Ready to Read

The Agent's Toolbox – Virtual Hands

With websearch, our agents had opened a window to the world. But an expert researcher doesnt just read: they analyze data, perform calculations, interact with other systems and, when necessary, consult other experts. To elevate our agents from simple "information gatherers" to true "digital analysts," we needed to drastically expand their toolbox.

The OpenAI Agents SDK classifies tools into three main categories, and our journey led us to implement them and understand their respective strengths and weaknesses.

1. Function Tools: Transforming Code into Capabilities

This is the most common and powerful form of tool. It allows you to transform any Python function into a capability that the agent can invoke. The SDK magically takes care of analyzing the function signature, argument types, and even the docstring to generate a schema that the LLM can understand.

The Architectural Decision: A Central "Tool Registry" and Decorators

To keep our code clean and modular (Pillar #14), we implemented a central ToolRegistry. Any function anywhere in our codebase can be transformed into a tool simply by adding a decorator.

Reference code: backend/tools/registry.py and backend/tools/web_search_tool.py

# Example of a Function Tool
from .registry import tool_registry

@tool_registry.register("websearch")
class WebSearchTool:
    """
    Performs a web search using the DuckDuckGo API to get updated information.
    Essential for tasks that require real-time data.
    """
    async def execute(self, query: str) -> str:
        # Logic to call a search API...
        return "Search results..."

The SDK allowed us to cleanly define not only the action (execute), but also its "advertisement" to the AI through the docstring, which becomes the tools description.

2. Hosted Tools: Leveraging Platform Power

Some tools are so complex and require such specific infrastructure that it doesnt make sense to implement them ourselves. These are called "Hosted Tools," services run directly on OpenAIs servers. The most important one for us was the CodeInterpreterTool.

The Challenge: The code_interpreter – A Sandboxed Analysis Laboratory

Many tasks required complex quantitative analysis. The solution was to give the AI the ability to write and execute Python code.

Reference code: backend/tools/code_interpreter_tool.py (integration logic)

"War Story": The Agent That Wanted to Format the Disk

As mentioned, our first encounter with the code_interpreter was traumatic. An agent generated dangerous code (rm -rf /*), teaching us the fundamental lesson about security.

The Lesson Learned: "Zero Trust Execution"

Code generated by an LLM must be treated as the most hostile input possible. Our security architecture is based on three levels:

Security Level	Implementation	Purpose
1. Sandboxing	Execution of all code in an ephemeral Docker container with minimal permissions (no access to network or host file system).	Completely isolate execution, making even the most dangerous commands harmless.
2. Static Analysis	A pre-execution validator that looks for obviously malicious code patterns (`os.system`, `subprocess`).	A quick first filter to block the most obvious abuse attempts.
3. Guardrail (Human-in-the-Loop)	An SDK `Guardrail` that intercepts code. If it attempts critical operations, it pauses execution and requests human approval.	The final safety net, applying Pillar #8 to tool security as well.

3. Agents as Tools: Consulting an Expert

This is the most advanced technique and the one that truly transformed our system into a digital organization. Sometimes, the best "tool" for a task isnt a function, but another agent.

We realized that our MarketingStrategist shouldn't try to do financial analysis. It should consult the FinancialAnalyst.

The "Agent-as-Tools" Pattern:

The SDK makes this pattern incredibly elegant with the .as_tool() method.

Reference code: Conceptual logic in director.py and specialist.py

# Definition of specialist agents
financial_analyst_agent = Agent(name="Financial Analyst", instructions="...")
market_researcher_agent = Agent(name="Market Researcher", instructions="...")

# Creation of the orchestrator agent
strategy_agent = Agent(name="StrategicPlanner",
    instructions="Analyze the problem and delegate to your specialists using tools.",
    tools=[
        financial_analyst_agent.as_tool(tool_name="consult_financial_analyst",
            tool_description="Ask a specific financial analysis question."
        ),
        market_researcher_agent.as_tool(
            tool_name="get_market_data",
            tool_description="Request updated market data."
        ),
    ],
)

This unlocked hierarchical collaboration. Our system was no longer a "flat" team, but a true organization where agents could delegate sub-tasks, request consultations, and aggregate results, just like in a real company.

📝 Key Takeaways of the Chapter:

✓ Choose the Right Tool Class: Not all tools are equal. Use Function Tools for custom capabilities, Hosted Tools for complex infrastructure (like the code_interpreter), and Agents as Tools for delegation and collaboration.

✓ Security is Not Optional: If you use powerful tools like code execution, you must design a multi-layered security architecture based on the "Zero Trust" principle.

✓ Delegation is a Superior Form of Intelligence: The most advanced agent systems arent those where every agent knows how to do everything, but those where every agent knows who to ask for help.

Chapter Conclusion

With a rich and secure toolbox, our agents were now able to tackle a much broader range of complex problems. They could analyze data, create visualizations, and collaborate at a much deeper level.

This, however, made the role of our quality system even more critical. With such powerful agents, how could we be sure that their outputs, now much more sophisticated, were still high quality and aligned with business objectives? This brings us back to our Quality Gate, but with a new and deeper understanding of what "quality" means.

graph TD A[Agent receives Task] B{AI decides to use a tool} C[SDK formats request for websearch] D{Executor intercepts the request} E[Calls tool_registry.get_tool websearch] F[Executes the actual search] G[Returns results to Executor] H[SDK passes results to Agent] I[Agent uses data to complete Task] A --> B B --> C C --> D D --> E E --> F F --> G G --> H H --> I

System Architecture

"War Story": The Test That Revealed AIs "Laziness"

We wrote a test to verify that the tools were working.

Reference code: tests/test_tools.py

The test was simple: give an agent a task that clearly required a web search (e.g., "Who is the current CEO of OpenAI?") and verify that the websearch tool was called.

The first results were disconcerting: the test failed 50% of the time.

Disaster Logbook (July 27):

ASSERTION FAILED: Web search tool was not called.;
AI Response: "As of my last update in early 2023, the CEO of OpenAI was Sam Altman."

The Problem: The LLM was "lazy." Instead of admitting it didnt have updated information and using the tool we had provided, it preferred to give an answer based on its internal knowledge, even if obsolete. It was choosing the easy way out, at the expense of quality and truthfulness.

The Lesson Learned: You Must Force Tool Usage

Its not enough to give a tool to an agent. You must create an environment and instructions that incentivize (or force) it to use it.

The solution was a refinement of our prompt engineering:

Explicit Instructions in System Prompt: We added a phrase to each agent's system prompt: "When you need current or specific information that you don't have, you MUST use the appropriate tools available to you."
"Priming" in Task Prompt: When assigning a task, we started adding a hint: "This task requires up-to-date information. Use your tools to ensure accuracy."

These changes increased tool usage from 50% to over 95%, solving the "laziness" problem and ensuring our agents actively sought real data.

📝 Key Takeaways of the Chapter:

✓ Agents Need Tools: An AI system without access to external tools is a limited system destined to become obsolete.

✓ Centralize Tools in a Registry: Don't tie tools to specific agents. A modular registry is more scalable and maintainable.

✓ AI Can Be "Lazy": Don't assume an agent will use the tools you provide. You must explicitly instruct and incentivize it to do so.

✓ Test Behavior, Not Just Output: Tool tests shouldn't just verify that the tool works, but that the agent decides to use it when strategically correct.

Chapter Conclusion

With the introduction of tools, our agents finally had a way to produce reality-based results. But this opened a new Pandora's box: quality.

Now that agents could produce data-rich content, how could we be sure this content was high quality, consistent, and, most importantly, of real business value? It was time to build our Quality Gate.

Now that you understand specialist agent architecture, it's time to build the perfect toolkit to orchestrate them effectively.

⏱️ 8 min read 📊 Intermediate

1. Function Tools: Transforming Code into Capabilities

2. Hosted Tools: Leveraging Platform Power

"War Story": The Agent That Wanted to Format the Disk

"War Story": The Agent That Wanted to Format the Disk

3. Agents as Tools: Consulting an Expert

📝 Key Takeaways of the Chapter:

System Architecture

"War Story": The Test That Revealed AIs "Laziness"

📝 Key Takeaways of the Chapter:

📚 My Bookmarks