Schema-First LLM Pipelines: An Alternative to Agent Loops

· 15 min read

The dominant pattern for “agentic AI” looks like this:

LLM: "I should search for information about X"
→ tool call: search(X)
→ results
LLM: "Now I should search for more context"
→ tool call: search(Y)
→ results
LLM: "Let me fetch that article"
→ tool call: fetch(URL)
→ results
LLM: "Now I have enough to write..."
→ output

Four LLM calls. Unpredictable loops. The agent “reasons” about which tools to use, when to stop searching, whether it has enough context. Every run is different. Testing is guesswork.

We’ve been building something different.

The Problem With Agent Loops

Agent loops sound elegant in theory. Give the LLM tools, let it figure out what to do. Emergent behavior. Intelligence.

In practice:

Unpredictable execution. The same input produces different tool call sequences. Sometimes it searches twice, sometimes five times. Sometimes it fetches URLs, sometimes it doesn’t. You can’t write tests for behavior you can’t predict.

Expensive. Each “reasoning step” is an LLM call. The agent spends tokens deciding whether to search before spending tokens on the actual work. Four to eight LLM calls for a task that needs one.

Slow. Sequential by nature. The LLM must complete each reasoning step before deciding on the next. No parallelism. A 30-second task becomes two minutes.

Untestable. How do you test an agent? Run it ten times and hope it does roughly the same thing? Mock the LLM responses? The nondeterminism is the feature, but it’s also the bug.

Unobservable. Why did the agent search five times instead of three? You can read the trace, but the “reasoning” is post-hoc narrative, not predictable logic.

Schema-First: A Different Model

What if the schema declared everything?

Z.object({
  // Input enrichment - runs BEFORE the LLM
  searchResults: Z.string()
    .describe("Web search results")
    .rag$webSearch(),

  // Output fields - the LLM fills these
  script: Z.string()
    .describe("Editorial script, 600-1000 words"),

  // Output processing - runs AFTER the LLM
  videoCommand: Z.string()
    .describe("Shell command")
    .job$video()
})

A note on syntax: this is Zontax, an in-house schema language we built. It’s a superset of Zod that lets you embed metadata directly in schema definitions via namespace extensions (the rag$ and job$ suffixes). The parser extracts both the validation schema AND the extension metadata, giving you a single source of truth for “what shape is this data” and “what should happen with it.” We use it for UI generation, LLM pipelines, and anywhere else schemas drive behavior.

The rag$webSearch() extension doesn’t ask the LLM whether to search. It searches. Before the LLM runs. The results are injected into context.

The job$video() extension doesn’t ask the LLM what to do with the output. It creates a video job. After the LLM runs. Deterministically.

The LLM’s job becomes simple: given this enriched context, fill in these fields. One call. No loops. No reasoning about tools.

The Execution Model

Trigger arrives (payload: topic, searchQueries)
         ↓
    RAG PHASE (parallel)
    ├─ rag$webSearch() → Tavily search
    └─ rag$fetchUrls() → SilkText extraction
         ↓
    Context assembled
         ↓
    ONE LLM CALL
    "Here's your research. Fill in these fields."
         ↓
    OUTPUT PHASE
    ├─ job$video() → Create video job
    └─ emit$always() → Fire downstream trigger
         ↓
    Done.

RAG runs first. All the data gathering happens before the LLM sees anything. Searches run in parallel. URL fetches run in parallel. The LLM gets a complete context, not a drip-feed.

One LLM call. The schema defines the output structure. The LLM fills it in. No “let me think about what to do next.” Just: here’s the task, here’s the context, go.

Output processing runs last. The extensions handle side effects. Create jobs, fire triggers, run commands. The LLM doesn’t need to decide whether to do these things. The schema declared them.

Extensions Are Declarations, Not Tools

The difference matters.

Tool call (imperative):

  • LLM decides whether to call
  • LLM decides parameters
  • LLM decides when to stop
  • Multiple LLM calls to coordinate

Extension (declarative):

  • Schema declares that it happens
  • Parameters come from payload or prior fields
  • Runs exactly once
  • Zero LLM calls to coordinate
// Tool-calling agent
// LLM: "I think I should search for 'Larian AI controversy'"
// LLM: "The results mention Bloomberg, let me search for that too"
// LLM: "I should fetch the full article..."
// (3+ LLM calls just to gather context)

// Schema extension
searchQueries: Z.array(Z.string())
  .describe("Search queries")
  .rag$webSearch()
// (0 LLM calls, searches run from payload)

The trigger payload contains searchQueries: ["Larian AI controversy", "Larian concept art AI"]. The extension runs both searches. The LLM never “decides” to search—it receives the results.

Composition Via Bus, Not Node Flow

Traditional workflow tools use node flows. Visual programming. Connect boxes with arrows. The flow is defined statically before execution.

We use a trigger bus instead.

Steps subscribe to triggers:

subscribesTo: ["generate_editorial"]
emits: ["editorial_complete"]

Steps don’t know about each other. The news scanner emits generate_video. It doesn’t know that three different scriptwriters subscribe to that trigger. It doesn’t need to.

Flow emerges from subscriptions:

content_update → news-scanner
                    ├→ generate_video → hype-beast → job
                    ├→ generate_video → mystery-box → job
                    └→ generate_video → emotional-hook → job

But you still get visualization. Every trigger carries a correlation ID. After execution, you can reconstruct exactly what happened. Render it as a node flow if you want. But the flow is observed, not prescribed.

This matters for evolution. Adding a new scriptwriter? Subscribe to generate_video. Done. No rewiring. No touching existing steps. The new behavior composes automatically.

Performance: The Trifecta

We built an editorial research pipeline this way. Topic comes in, web search runs, LLM writes a 600-word script with citations.

Traditional agent approach (estimated):

  • 4-8 LLM calls (reasoning + tool coordination)
  • 30-60+ seconds
  • Unpredictable (sometimes more calls, sometimes fewer)

Schema-first approach (measured):

  • 1 LLM call
  • 12 seconds total (including 2 Tavily searches)
  • Identical execution path every time

The RAG extensions run in parallel. The LLM runs once. The output extensions run deterministically. No reasoning overhead. No coordination tax.

Performance, cost, AND predictability. You don’t have to choose.

Testing Becomes Possible

We built a test harness:

npx convex run workflowSteps:testStep '{
  "slug": "scriptwriter-longform-editorial",
  "payload": {
    "topic": "Larian Studios AI controversy",
    "searchQueries": ["Larian AI December 2025"]
  }
}'

This runs one step in isolation. Same code path as production. Deterministic. You can iterate on prompts and schemas without flooding downstream systems.

Try testing an agent loop this way. “Run the agent but stop after 3 tool calls”? “Mock the LLM’s decision to search”? The nondeterminism makes it nearly impossible.

Schema-first steps are functions. Input → processing → output. Testable like functions.

The Schema Is The Interface

Here’s where it gets interesting.

The same Zontax schema could compile to:

A TypeScript function:

async function editorialResearch(input: {
  topic: string;
  searchQueries: string[];
}): Promise<{
  script: string;
  sources: string[];
}>

A cloud workflow step (what we have now)

A web form (we use Zontax for dynamic UIs in another project)

An API endpoint

The schema declares the contract. The runtime decides how to fulfill it. Same declaration, multiple targets.

This is the direction. Not “a better agent framework.” A different paradigm. Schema-first, not tool-first. Declarative, not imperative. Predictable, not emergent.

What We’re Building Toward

The foundation exists:

  • Zontax parser with namespace extensions
  • Step runner (RAG → LLM → Output)
  • Trigger bus for composition
  • Correlation ID tracing
  • Test harness for iteration

What’s next:

  • TypeScript compilation target (steps as typed functions)
  • More extension types (embedding, vector search, external APIs)
  • Visual trace explorer (correlation ID → flow diagram)
  • Multi-model routing (different models for different steps)

The agent loop had its moment. For some problems, it’s still the right tool. But for structured workflows where you know what data you need and what you’re producing?

Schema-first is faster. Cheaper. Testable. Observable.

And it actually works.


This post is part of the 6digit technical series. We build AI-powered tools for game development and content creation. The schema-first pipeline described here powers our editorial research system.