Skip to main content

The Hybrid Approach: Structured Output Inside Orchestration

· 6 min read
ObjectWeaver Team
AI Infrastructure Platform

We've shown that structured output wins for coherent extraction and orchestration wins for deep analysis. The question was whether you could use both in the same definition. You can now.

The Problem

Structured output is cheap and coherent but compresses output when the schema is large. Orchestration produces rich analysis but repeats the entire input document in every LLM call — hundreds of times for a complex schema.

A legal document pipeline with eight fields illustrates this:

  • Extraction fields (parties, dates, citations, transactions) — mechanical. Find the named things, return a list. The model doesn't need thousands of tokens per item.
  • Reasoning fields (risk assessment, strategic brief) — require synthesis. Weigh evidence, identify vulnerabilities, produce nuanced multi-paragraph analysis.

Forcing all eight through structured output compresses the reasoning fields. Forcing all eight through orchestration wastes tokens on the extraction fields. The hybrid approach lets you choose per field.

How It Works

Add structuredOutput: true to any Object or Array node in your definition. That subtree collapses into a single LLM call with a JSON Schema response_format instead of decomposing into per-field calls.

{
"type": "object",
"properties": {
"parties": {
"type": "array",
"structuredOutput": true,
"instruction": "Extract all parties mentioned...",
"items": {
"type": "object",
"properties": {
"name": { "type": "string", "instruction": "Full legal name" },
"role": { "type": "string", "instruction": "Role in the matter" },
"represented_by": { "type": "string", "instruction": "Legal counsel" }
}
}
},
"risk_assessment": {
"type": "object",
"instruction": "Analyse litigation risk...",
"properties": {
"vulnerabilities": { "type": "string", "instruction": "..." },
"recommendation": {
"type": "string",
"instruction": "...",
"selectFields": ["vulnerabilities"]
}
}
}
}
}

parties gets one API call. The LLM returns a JSON array matching the schema. risk_assessment uses normal OW decomposition — vulnerabilities gets its own call, then recommendation gets a separate call with the vulnerability analysis injected via selectFields.

The Mechanics

When OW encounters structuredOutput: true, the StructuredOutputProcessor does three things:

  1. Schema conversion — walks the Definition subtree and produces a JSON Schema map. This becomes the response_format in the API request.
  2. Single call — sends one prompt with the field's instruction and schema constraint. The LLM returns structured JSON that gets stored as the field's value.
  3. Fallback — if the call fails (context too large, schema too complex), OW falls back to normal per-field decomposition automatically.

The result is stored in the execution context like any other generated value. Downstream fields can reference it via selectFields.

Benchmark: Document Analysis With Hybrid

We reran the legal brief benchmark with structuredOutput: true on the five extraction fields (metadata, parties, key_dates, legal_citations, financial_transactions) and normal decomposition on the two reasoning fields (risk_assessment, strategic_brief).

Performance

MetricStructured APIOW (no hybrid)OW Hybrid
Duration54s15s31s
API calls1~228~25
Estimated input tokens3,755~1,010,000~35,000

The hybrid used 97% fewer input tokens than pure orchestration while keeping orchestration's depth advantage on the reasoning fields.

Why is hybrid slower than pure OW? Pure OW fires all 228 calls concurrently. Hybrid's structured calls are larger and sequential per field. The trade-off is latency for cost.

Output Quality

FieldStructured APIOW Hybrid
parties8 items / 176 words15 items / 454 words
key_dates17 / 311 words17 / 1,526 words
legal_citations10 / 471 words11 / 905 words
risk_assessment5 fields / 457 words5 fields / 971 words
vulnerabilities found47

Extraction fields matched or exceeded the pure structured API — each field still gets the model's full output window rather than sharing a budget across eight fields. Reasoning fields held orchestration's depth advantage: 2.1× more risk analysis and nearly twice the vulnerabilities identified.

The Cost Arithmetic

The cost difference comes down to how many times you repeat the input document.

Pure orchestration: every leaf field is a separate call, full input text each time. Input tokens ≈ N(fields) × T(document). For our benchmark: 228 × 4,400 ≈ 1,003,200 input tokens.

Hybrid: extraction fields collapse into ~5 structured calls, reasoning fields into ~20 OW calls. Input tokens ≈ (N(structured) + N(decomposed)) × T(document). For our benchmark: 25 calls × ~1,400 avg tokens ≈ 35,000 input tokens.

ApproachInput tokensInput costReduction
Pure orchestration~1,010,000$0.0757
Hybrid~35,000$0.002696.5%
Pure structured~3,755$0.000399.6%

The hybrid is 29× cheaper than pure orchestration. It costs more than pure structured output, but produces meaningfully richer analysis on the reasoning fields — 2.1× more content, 7 vs 4 vulnerabilities.

At scale, the difference compounds. Processing 1,000 documents:

ApproachTotal input costQuality
Pure orchestration$75.70Deep analysis on all fields
Hybrid$2.60Deep analysis on reasoning fields, structured extraction elsewhere
Pure structured$0.28Compressed analysis across all fields

Prompt Prefix Caching

A second cost optimisation works alongside hybrid mode. LLM providers (Gemini, OpenAI) automatically cache prompt prefixes — if consecutive requests share the same opening text, the cached portion is billed at a discount (75% off for Gemini).

OW's prompt template previously put field-specific instructions before the input text:

Task: return info about "parties"...     ← varies per field
Direct Instruction: ... ← varies per field
User information: {3,000 tokens} ← same every call

Every call had a different prefix, so no caching occurred. We reordered the template:

User information: {3,000 tokens}         ← same every call (cached)
Task: return info about "parties"... ← varies (small suffix)
Direct Instruction: ... ← varies (small suffix)

Now sibling fields share an identical prefix. For the ~20 remaining decomposition calls in hybrid mode, Gemini caches ~3,000 tokens of the ~3,400-token prompt. Calls 2–20 pay only 25% on the cached portion.

Cache saving on the decomposed reasoning fields: 19 calls × 3,000 cached tokens × 75% discount ≈ 42,750 tokens effectively free — a further ~15% reduction on top of the hybrid savings.

When to Use Hybrid

Mark as structuredOutput: true: extraction arrays (entities, dates, citations, transactions), simple objects (metadata, classification, tagging), and fields with many properties where decomposition would generate 10+ calls for no quality gain.

Leave as normal OW decomposition: reasoning fields where depth matters, fields with selectFields chains, and fields you want to route to different models or temperatures.

The rule of thumb: if a field is about finding things → structured output. If it's about thinking about things → orchestration.

What Shipped

Three things:

structuredOutput flag on Definition — any Object or Array node can opt in. The processor converts the subtree to a JSON Schema and sends one constrained API call.

Automatic fallback — if the structured call fails (context window, unsupported schema), OW falls back to normal decomposition silently.

Prompt prefix reordering — the shared input text now leads the prompt, enabling automatic prefix caching across sibling field calls. This benefits all OW definitions, not just hybrid ones.

Existing definitions without structuredOutput behave exactly as before.