We had a dynamic team and an intelligent orchestrator. But our agents, however well-designed, were still "digital philosophers." They could reason, plan, and write, but they couldnt act on the external world. Their knowledge was limited to what was intrinsic to the LLM modelโa snapshot of the past, devoid of real-time data.
An AI system that cannot access updated information is destined to produce generic, outdated, and ultimately useless content. To respect our Pillar #11 (Concrete and Actionable Deliverables), we had to give our agents the ability to "see" and "interact" with the external world. We had to give them Tools.
Our first decision was not to associate tools directly with individual agents in the code. This would have created tight coupling and made management difficult. Instead, we created a centralized Tool Registry.
Reference code: backend/tools/registry.py
(hypothetical, based on our logic)
This registry is a simple dictionary that maps a tool name (e.g., "websearch"
) to an executable class.
# tools/registry.py
class ToolRegistry:
def __init__(self):
self._tools = {}
def register(self, tool_name):
def decorator(tool_class):
self._tools[tool_name] = tool_class()
return tool_class
return decorator
def get_tool(self, tool_name):
return self._tools.get(tool_name)
tool_registry = ToolRegistry()
# tools/web_search_tool.py
from .registry import tool_registry
@tool_registry.register("websearch")
class WebSearchTool:
async def execute(self, query: str):
# Logic to call a search API like DuckDuckGo
...
This approach gave us incredible flexibility:
ImageGenerator
) simply means creating a new file and registering it, without touching the logic of agents or the orchestrator.websearch
โ The Window to the WorldThe first and most important tool we implemented was websearch
. This single instrument transformed our agents from "students in a library" to "field researchers."
When an agent needs to execute a task, the OpenAI SDK allows it to autonomously decide whether it needs a tool. If the agent "thinks" it needs to search the web, the SDK formats a tool execution request. Our Executor
intercepts this request, calls our implementation of the WebSearchTool
, and returns the result to the agent, which can then use it to complete its work.
Tool Execution Flow:
We wrote a test to verify that the tools were working.
Reference code: tests/test_tools.py
The test was simple: give an agent a task that clearly required a web search (e.g., "Who is the current CEO of OpenAI?") and verify that the websearch
tool was called.
The first results were disconcerting: the test failed 50% of the time.
Disaster Logbook (July 27):
ASSERTION FAILED: Web search tool was not called.;
AI Response: "As of my last update in early 2023, the CEO of OpenAI was Sam Altman."
The Problem: The LLM was "lazy." Instead of admitting it didnt have updated information and using the tool we had provided, it preferred to give an answer based on its internal knowledge, even if obsolete. It was choosing the easy way out, at the expense of quality and truthfulness.
The Lesson Learned: You Must Force Tool Usage
Its not enough to give a tool to an agent. You must create an environment and instructions that incentivize (or force) it to use it.
The solution was a refinement of our prompt engineering:
These changes increased tool usage from 50% to over 95%, solving the "laziness" problem and ensuring our agents actively sought real data.
โ Agents Need Tools: An AI system without access to external tools is a limited system destined to become obsolete.
โ Centralize Tools in a Registry: Don't tie tools to specific agents. A modular registry is more scalable and maintainable.
โ AI Can Be "Lazy": Don't assume an agent will use the tools you provide. You must explicitly instruct and incentivize it to do so.
โ Test Behavior, Not Just Output: Tool tests shouldn't just verify that the tool works, but that the agent decides to use it when strategically correct.
Chapter Conclusion
With the introduction of tools, our agents finally had a way to produce reality-based results. But this opened a new Pandora's box: quality.
Now that agents could produce data-rich content, how could we be sure this content was high quality, consistent, and, most importantly, of real business value? It was time to build our Quality Gate.
To realize this vision, the Director
s prompt had to be incredibly detailed.
Reference code: backend/director.py
(_generate_team_proposal_with_ai
logic)
prompt = f"""
You are a Director of a world-class AI talent agency. Your task is to analyze a new project's objective and assemble the perfect AI agent team to ensure its success, treating each agent as a human professional.
**Obiettivo del Progetto:**
"{workspace_goal}"
**Available Budget:** {budget} EUR
**Expected Timeline:** {timeline}
**Required Analysis:**
1. **Functional Decomposition:** Break down the objective into its main functional areas (e.g., "Data Research", "Creative Writing", "Technical Analysis", "Project Management").
2. **Role-Skills Mapping:** For each functional area, define the necessary specialized role and the 3-5 essential key competencies (hard skills).
3. **Soft Skills Definition:** For each role, identify 2-3 crucial soft skills (e.g., "Problem Solving" for an analyst, "Empathy" for a designer).
4. **Optimal Team Composition:** Assemble a team of 3-5 agents, balancing skills to cover all areas without unnecessary overlaps. Assign seniority (Junior, Mid, Senior) to each role based on complexity.
5. **Budget Optimization:** Ensure the total estimated team cost doesn't exceed the budget. Prioritize efficiency: a smaller, senior team is often better than a large, junior one.
6. **Complete Profile Generation:** For each agent, create a realistic name, personality, and brief background story that justifies their competencies.
**Output Format (JSON only):**
{{
"team_proposal": [
{{
"name": "Agent Name",
"role": "Specialized Role",
"seniority": "Senior",
"hard_skills": ["skill 1", "skill 2"],
"soft_skills": ["skill 1", "skill 2"],
"personality": "Pragmatic and data-driven.",
"background_story": "A brief story that contextualizes their competencies.",
"estimated_cost_eur": 5000
}}
],
"total_estimated_cost": 15000,
"strategic_reasoning": "The logic behind this team's composition..."
}}
"""
The first tests were a comic disaster. For a simple project to "write 5 emails", the Director
proposed a team of 8 people, including an "AI Ethicist" and a "Digital Anthropologist". It had interpreted our desire for quality too literally, creating perfect but economically unsustainable teams.
Disaster Logbook (July 27):
PROPOSAL: Team of 8 agents. Estimated cost: โฌ25,000. Budget: โฌ5,000.
REASONING: "To ensure maximum ethical and cultural quality..."
The Lesson Learned: Autonomy Needs Clear Constraints.
An AI without constraints will tend to "over-optimize" the request. We learned that we needed to be explicit about constraints, not just objectives. The solution was to add two critical elements to the prompt and logic:
Available Budget
and Expected Timeline
sections.if proposal.total_cost > budget: raise ValueError("Proposal over budget.")
.This experience reinforced Pillar #5 (Goal-Driven with Automatic Tracking). An objective is not just a "what", but also a "how much" (budget) and a "when" (timeline).
โ Treat Agents as Colleagues: Design your agents with rich profiles (hard/soft skills, personality). This improves task matching and makes the system more intuitive.
โ Delegate Team Composition to AI: Dont hard-code roles. Let AI analyze the project and propose the most suitable team.
โ Autonomy Requires Constraints: To get realistic results, you must provide AI not only with objectives, but also constraints (budget, time, resources).
โ Use AI for Creativity, Code for Rules: AI is excellent at generating creative profiles. Code is perfect for applying rigid, non-negotiable rules (like budget compliance).
Chapter Conclusion
With the Director
, our system had reached a new level of autonomy. Now it could not only execute a plan, but also create the right team to execute it. We had a system that dynamically adapted to the nature of each new project.
But a team, however well composed, needs tools to work with. Our next challenge was understanding how to provide agents with the right "tools" for each trade, anchoring their intellectual capabilities to concrete actions in the real world.