🎺
Movement 1 of 4 Chapter 4 of 42 Ready to Read

The Parsing Drama and Birth of the "AI Contract"

We had a testable agent and a robust test environment. We were ready to start building real business functionality. Our first goal was simple: have an agent, given an objective, decompose it into a list of structured tasks.

It seemed easy. The prompt was clear, the agent responded. But when we tried to use the output, the system started failing in unpredictable and frustrating ways. Welcome to the Parsing Drama.

# The Problem: The Illusion of Structure

Asking an LLM to respond in JSON format is a common practice. The problem is that an LLM doesn't generate JSON, it generates text that looks like JSON. This subtle difference is the source of countless bugs and sleepless nights.

Real Examples of JSON Parsing Errors from Our Logs

Our logs revealed common parsing issues. Here are some real examples we faced:

These weren't isolated cases; they were the norm. We realized we couldn't build a reliable system if our communication layer with the AI was so fragile.

# The Architectural Solution: An "Immune System" for AI Input

We stopped considering these errors as bugs to fix one by one. We saw them as a systemic problem that required an architectural solution: an "Anti-Corruption Layer" to protect our system from AI unpredictability.

This solution is based on two components working in tandem:

Phase 1: The Output "Sanitizer" (IntelligentJsonParser)

We created a dedicated service not just to parse, but to isolate, clean, and correct the raw LLM output.

Reference code: backend/utils/json_parser.py (hypothetical)

import re
import json

class IntelligentJsonParser:
    
    def extract_and_parse(self, raw_text: str) -> dict:
        """
        Extracts, cleans, and parses a JSON block from a text string.
        """
        try:
            # 1. Extraction: Find the JSON block, ignoring surrounding text.
            json_match = re.search(r'\{.*\}|\[.*\]', raw_text, re.DOTALL)
            if not json_match:
                raise ValueError("No JSON block found in text.")
            
            json_string = json_match.group(0)
            
            # 2. Cleaning: Remove common errors like trailing commas.
            # (This is a simplification; the real logic is more complex)
            json_string = re.sub(r',\s*([\}\]])', r'\1', json_string)
            
            # 3. Parsing: Convert the clean string to a Python object.
            return json.loads(json_string)
            
        except Exception as e:
            logger.error(f"Parsing failed: {e}")
            # Here could start a "retry" logic
            raise

Phase 2: The Pydantic "Data Contract"

Once we obtained a syntactically valid JSON, we needed to guarantee its semantic validity. Were the structure and data types correct? For this, we used Pydantic as an inflexible "contract".

Reference code: backend/models.py

from pydantic import BaseModel, Field
from typing import List, Literal

class SubTask(BaseModel):
    task_name: str = Field(..., description="The name of the sub-task.")
    description: str
    priority: Literal["low", "medium", "high"]

class TaskDecomposition(BaseModel):
    tasks: List[SubTask]
    reasoning: str

Any JSON that didn't respect exactly this structure was discarded, generating a controlled error instead of an unpredictable downstream crash.

Complete Validation Flow:

System Architecture

graph TD A[Raw LLM Output] --> B{Phase 1: Sanitizer} B -- Regex to extract JSON --> C[Clean JSON String] C --> D{Phase 2: Pydantic Contract} D -- Validated data --> E[Safe TaskDecomposition Object] B -- Extraction Failure --> F{Managed Error} D -- Invalid data --> F F --> G[Log Error / Trigger Retry] E --> H[System Usage]

# The Lesson Learned: AI is a Collaborator, not a Compiler

This experience radically changed our way of interacting with LLMs and reinforced several of our pillars:

We learned to treat AI as an incredibly talented but sometimes distracted collaborator. Our job as engineers isn't just to "ask", but also to "verify, validate, and, if necessary, correct" its work.

📝 Chapter Key Takeaways:

Never trust LLM output. Always treat it as unreliable user input.

Separate parsing from validation. First get syntactically correct JSON, then validate its structure and types with a model (like Pydantic).

Centralize parsing logic. Create a dedicated service instead of repeating error handling logic throughout the codebase.

A robust system allows greater AI delegation. The stronger your barriers, the more you can afford to entrust complex tasks to artificial intelligence.

Chapter Conclusion

With a reliable parsing and validation system, we finally had a way to give complex instructions to AI and receive structured data we could rely on in return. We had transformed AI output from a source of bugs into a reliable resource.

But having reliable communication with individual agents wasn't enough. We needed to understand how to design agents themselves, with clear roles, responsibilities, and boundaries. This brought us to our next challenge: architecting our first Specialist Agent.

🌙 Theme
🔖 Bookmark
📚 My Bookmarks
🔤 Font Size
Bookmark saved!