🎹
🎹 Movement 3 of 4 📖 Chapter 27 of 42 ⏱️ ~10 min read 📊 Level: Advanced

The Technology Stack – Foundations

An architecture, no matter how brilliant, remains an abstract idea until it's built with concrete tools. The choice of these tools is never just a matter of technical preference; it's a statement of intent. Every technology we've chosen for this project was selected not only for its features, but for how it aligned with our philosophy of rapid, scalable, and AI-first development.

This chapter unveils the "building blocks" of our cathedral: the technology stack that made this architecture possible, and the strategic "why" behind every choice.

The Backend: FastAPI – The Mandatory Choice for Asynchronous AI

When building a system that must orchestrate dozens of calls to slow external services like LLMs, asynchronous programming isn't an option, it's a necessity. Choosing a synchronous framework (like Flask or Django in their classic configurations) would have meant creating an inherently slow and inefficient system, where every AI call would block the entire process.

FastAPI was the natural choice and, in our opinion, the only truly sensible one for an AI-driven backend.

Why FastAPI? Strategic Benefit Reference Pillar
Native Asynchronous (async/await) Allows our Executor to manage hundreds of agents in parallel without blocking, maximizing efficiency and throughput. #4 (Scalable), #15 (Performance)
Pydantic Integration Data validation through Pydantic is integrated into the framework's core. This made creating our "data contracts" (see Chapter 4) simple and robust. #10 (Production-Ready)
Automatic Documentation (Swagger) FastAPI automatically generates interactive API documentation, accelerating frontend development and integration testing. #10 (Production-Ready)
Python Ecosystem Allowed us to remain in the Python ecosystem, leveraging fundamental libraries like the OpenAI Agents SDK, which is primarily designed for this environment. #1 (Native SDK)

The Frontend: Next.js – Separation of Concerns for Agility and UX

We could have served the frontend directly from FastAPI, but we made a deliberate strategic choice: completely separate the backend from the frontend.

Next.js (a React-based framework) allowed us to create an independent frontend application that communicates with the backend only through APIs.

Why a Separate Frontend with Next.js? Strategic Benefit Reference Pillar
Parallel Development Frontend and backend teams can work in parallel without blocking each other. The only dependency is the "contract" defined by the APIs. #4 (Scalable)
Superior User Experience Next.js is optimized for creating fast, responsive, and modern user interfaces, fundamental for managing the real-time nature of our system (see Chapter 21 on "Deep Reasoning"). #9 (Minimal UI/UX)
Skill Specialization Allows developers to specialize: Python experts on the backend, TypeScript/React experts on the frontend. #4 (Scalable)

The Database: Supabase – A "Backend-as-a-Service" for Speed

In an AI project, complexity is already sky-high. We wanted to minimize infrastructural complexity. Instead of managing our own PostgreSQL database, an authentication system, and a data API, we chose Supabase.

Supabase gave us the superpowers of a complete backend with the configuration effort of a simple database.

Why Supabase? Strategic Benefit Reference Pillar
Managed PostgreSQL Gave us all the power and reliability of a relational SQL database without the burden of management, backup, and scaling. #15 (Robustness)
Automatic Data API Supabase automatically exposes a RESTful API for every table, allowing us to do rapid prototyping and debugging directly from the browser or scripts. #10 (Production-Ready)
Integrated Authentication Provided a complete user management system from day one, allowing us to focus on AI logic rather than reimplementing authentication. #4 (Scalable)

Vector Databases: The Brain Extension for AI Systems

Vector databases are a crucial component for the effectiveness of Large Language Model (LLM)-based systems, as they solve the limited context problem.

What it is and why it's useful

A vector database is a type of database specialized in storing, indexing, and searching embeddings. Embeddings are numerical representations (vectors) of objects, such as text, images, audio, or other data, that capture their semantic meaning. Two similar objects will have nearby vectors in space, while two very different objects will have distant vectors.

Their role is fundamental for allowing LLMs to access external information not contained in their training set. Instead of having to "remember" everything, the LLM can query the vector database to find the most relevant information based on the user's query. This process, called Retrieval-Augmented Generation (RAG), works like this:

  1. The user's query is converted into an embedding (a vector).
  2. The vector database searches for the most similar vectors (and therefore the semantically most relevant documents) to the query's vector.
  3. The retrieved documents are provided to the LLM along with the original query, enriching its context and allowing it to generate a more precise and up-to-date response.

When to use one solution or another

In our case, we're using OpenAI's native vector database. This is a practical and fast choice, especially if you're already using the OpenAI SDK. It's useful for:

However, as we've rightly noted, you might want to consider dedicated solutions like Pinecone in the future. These options are often preferable for:

Coder CLI: Overcoming Context Limitations

Coder CLI (Command Line Interface) represents a significant evolution in LLM usage, transforming them from simple text generators to autonomous agents capable of action.

How it works and why it's effective

The main problem with LLMs is restricted context: they can only process a limited amount of text in a single input. Coder CLIs circumvent this limitation with an iterative and goal-based approach. Instead of receiving a single complex instruction, the CLI:

  1. Receives a general objective (e.g., "Fix bug X").
  2. Breaks down the objective into a series of smaller steps, creating a todo list.
  3. Executes one command at a time in a controlled environment (e.g., a shell/bash).
  4. Analyzes the output of each command to decide the next step.

This process of cascading reasoning allows the LLM to maintain focus, overcoming the limited context problem and tackling complex tasks that require multiple steps. The CLI can execute any shell/bash command, allowing it to:

Potential and current limitations

Potential:

Limitations and how we've worked around them:

The CLI creates todo lists and executes them systematically, maintaining persistent context across multiple command executions. This allows complex multi-step operations that would be impossible with traditional LLM interactions. The ability to execute any shell command means the CLI can write Python scripts to interact with databases, make API requests, perform file operations, and even manage entire deployment processes.

From our experience, while architectural reasoning capabilities are still developing, the targeted prompting approach using structural frameworks (like our 15 pillars) significantly improves the quality of architectural decisions and helps maintain a holistic view of system design.

Development Tools: Claude CLI and Gemini CLI – Human-AI Co-Creation

Finally, it's essential to mention how this manual itself and much of the code were developed. We didn't use a traditional IDE in isolation. We adopted an approach of "pair programming" with command-line AI assistants.

This isn't just a technical detail, but a true development methodology that shaped the product.

Tool Role in Our Development Why It's Strategic
Claude CLI The Specialized Executor. We used it for specific and targeted tasks: "Write a Python function that does X", "Fix this code block", "Optimize this SQL query". Excellent for generating high-quality code and refactoring specific blocks.
Gemini CLI The Strategic Architect. We used it for higher-level questions: "What are the pros and cons of this architectural pattern?", "Help me structure this chapter's narrative", "Analyze this codebase and identify potential 'code smells'". Its ability to analyze the entire codebase and reason about abstract concepts was fundamental for making the architectural decisions discussed in this book.

This "AI-assisted" development approach allowed us to move at a speed unthinkable just a few years ago. We used AI not only as the object of our development, but as a partner in the creation process.

📈 Market Trend: The Shift Towards Specialized B2B Models

Our model-agnostic architecture arrives at the right time. Tomasz Tunguz, in his article "A Shift in LLM Marketing: The Rise of the B2B Model" (2024), highlights a fundamental trend: we're witnessing the shift from "one-size-fits-all" models to LLMs specialized for enterprise.

Concrete examples: Snowflake launched Arctic as "the best LLM for enterprise AI", optimized for SQL and code completion. Databricks with DBRX/Mistral focuses on training and inference efficiency. The key point: performance on general knowledge is saturating, now what matters is optimizing for specific use cases.

Our architecture's advantage: Thanks to the modular design, we can assign each agent the most suitable model for its role - an AnalystAgent could use an LLM specialized for research/data, while a CopywriterAgent could utilize one optimized for natural language. As Tunguz notes, smaller and specialized models (like Llama 3 8B) can perform as well as their "bigger brothers" at a fraction of the cost.

Our philosophy of "digital specialists" with defined roles perfectly aligns with this market evolution: specialization beats generalization, both in agents and underlying models.

📈 Validation from DeepMind: The Scaling Laws (Chinchilla paper) show that there's an optimal model for every computational budget. Beyond a certain size, scaling parameters gives diminishing returns. This supports our philosophy: better specialized agents with targeted models than a single "super-model" generalist.

📝 Chapter Key Takeaways:

The Stack is a Strategic Choice: Every technology you choose should support and reinforce your architectural principles.

Asynchronous is Mandatory for AI: Choose a backend framework (like FastAPI) that treats asynchrony as a first-class citizen.

Decouple Frontend and Backend: It will give you agility, scalability, and allow you to build a better User Experience.

Embrace "AI-Assisted" Development: Use command-line AI tools not just to write code, but to reason about architecture and accelerate the entire development lifecycle.

Chapter Conclusion

With this overview of our cathedral's "building blocks", the picture is complete. We've explored not only the abstract architecture, but also the concrete technologies and development methodologies that made it possible.

We're now ready for final reflections, to distill the most important lessons from this journey and look at what the future holds for us.