The Hybrid Approach: Structured Output Inside Orchestration
We've shown that structured output wins for coherent extraction and orchestration wins for deep analysis. The question was whether you could use both in the same definition. You can now.
The Problem
Structured output is cheap and coherent but compresses output when the schema is large. Orchestration produces rich analysis but repeats the entire input document in every LLM call — hundreds of times for a complex schema.
A legal document pipeline with eight fields illustrates this:
- Extraction fields (parties, dates, citations, transactions) — mechanical. Find the named things, return a list. The model doesn't need thousands of tokens per item.
- Reasoning fields (risk assessment, strategic brief) — require synthesis. Weigh evidence, identify vulnerabilities, produce nuanced multi-paragraph analysis.
Forcing all eight through structured output compresses the reasoning fields. Forcing all eight through orchestration wastes tokens on the extraction fields. The hybrid approach lets you choose per field.
How It Works
Add structuredOutput: true to any Object or Array node in your definition. That subtree collapses into a single LLM call with a JSON Schema response_format instead of decomposing into per-field calls.
{
"type": "object",
"properties": {
"parties": {
"type": "array",
"structuredOutput": true,
"instruction": "Extract all parties mentioned...",
"items": {
"type": "object",
"properties": {
"name": { "type": "string", "instruction": "Full legal name" },
"role": { "type": "string", "instruction": "Role in the matter" },
"represented_by": { "type": "string", "instruction": "Legal counsel" }
}
}
},
"risk_assessment": {
"type": "object",
"instruction": "Analyse litigation risk...",
"properties": {
"vulnerabilities": { "type": "string", "instruction": "..." },
"recommendation": {
"type": "string",
"instruction": "...",
"selectFields": ["vulnerabilities"]
}
}
}
}
}
parties gets one API call. The LLM returns a JSON array matching the schema. risk_assessment uses normal OW decomposition — vulnerabilities gets its own call, then recommendation gets a separate call with the vulnerability analysis injected via selectFields.
The Mechanics
When OW encounters structuredOutput: true, the StructuredOutputProcessor does three things:
- Schema conversion — walks the Definition subtree and produces a JSON Schema map. This becomes the
response_formatin the API request. - Single call — sends one prompt with the field's instruction and schema constraint. The LLM returns structured JSON that gets stored as the field's value.
- Fallback — if the call fails (context too large, schema too complex), OW falls back to normal per-field decomposition automatically.
The result is stored in the execution context like any other generated value. Downstream fields can reference it via selectFields.
Benchmark: Document Analysis With Hybrid
We reran the legal brief benchmark with structuredOutput: true on the five extraction fields (metadata, parties, key_dates, legal_citations, financial_transactions) and normal decomposition on the two reasoning fields (risk_assessment, strategic_brief).
Performance
| Metric | Structured API | OW (no hybrid) | OW Hybrid |
|---|---|---|---|
| Duration | 54s | 15s | 31s |
| API calls | 1 | ~228 | ~25 |
| Estimated input tokens | 3,755 | ~1,010,000 | ~35,000 |
The hybrid used 97% fewer input tokens than pure orchestration while keeping orchestration's depth advantage on the reasoning fields.
Why is hybrid slower than pure OW? Pure OW fires all 228 calls concurrently. Hybrid's structured calls are larger and sequential per field. The trade-off is latency for cost.
Output Quality
| Field | Structured API | OW Hybrid |
|---|---|---|
| parties | 8 items / 176 words | 15 items / 454 words |
| key_dates | 17 / 311 words | 17 / 1,526 words |
| legal_citations | 10 / 471 words | 11 / 905 words |
| risk_assessment | 5 fields / 457 words | 5 fields / 971 words |
| vulnerabilities found | 4 | 7 |
Extraction fields matched or exceeded the pure structured API — each field still gets the model's full output window rather than sharing a budget across eight fields. Reasoning fields held orchestration's depth advantage: 2.1× more risk analysis and nearly twice the vulnerabilities identified.
The Cost Arithmetic
The cost difference comes down to how many times you repeat the input document.
Pure orchestration: every leaf field is a separate call, full input text each time. Input tokens ≈ N(fields) × T(document). For our benchmark: 228 × 4,400 ≈ 1,003,200 input tokens.
Hybrid: extraction fields collapse into ~5 structured calls, reasoning fields into ~20 OW calls. Input tokens ≈ (N(structured) + N(decomposed)) × T(document). For our benchmark: 25 calls × ~1,400 avg tokens ≈ 35,000 input tokens.
| Approach | Input tokens | Input cost | Reduction |
|---|---|---|---|
| Pure orchestration | ~1,010,000 | $0.0757 | — |
| Hybrid | ~35,000 | $0.0026 | 96.5% |
| Pure structured | ~3,755 | $0.0003 | 99.6% |
The hybrid is 29× cheaper than pure orchestration. It costs more than pure structured output, but produces meaningfully richer analysis on the reasoning fields — 2.1× more content, 7 vs 4 vulnerabilities.
At scale, the difference compounds. Processing 1,000 documents:
| Approach | Total input cost | Quality |
|---|---|---|
| Pure orchestration | $75.70 | Deep analysis on all fields |
| Hybrid | $2.60 | Deep analysis on reasoning fields, structured extraction elsewhere |
| Pure structured | $0.28 | Compressed analysis across all fields |
Prompt Prefix Caching
A second cost optimisation works alongside hybrid mode. LLM providers (Gemini, OpenAI) automatically cache prompt prefixes — if consecutive requests share the same opening text, the cached portion is billed at a discount (75% off for Gemini).
OW's prompt template previously put field-specific instructions before the input text:
Task: return info about "parties"... ← varies per field
Direct Instruction: ... ← varies per field
User information: {3,000 tokens} ← same every call
Every call had a different prefix, so no caching occurred. We reordered the template:
User information: {3,000 tokens} ← same every call (cached)
Task: return info about "parties"... ← varies (small suffix)
Direct Instruction: ... ← varies (small suffix)
Now sibling fields share an identical prefix. For the ~20 remaining decomposition calls in hybrid mode, Gemini caches ~3,000 tokens of the ~3,400-token prompt. Calls 2–20 pay only 25% on the cached portion.
Cache saving on the decomposed reasoning fields: 19 calls × 3,000 cached tokens × 75% discount ≈ 42,750 tokens effectively free — a further ~15% reduction on top of the hybrid savings.
When to Use Hybrid
Mark as structuredOutput: true: extraction arrays (entities, dates, citations, transactions), simple objects (metadata, classification, tagging), and fields with many properties where decomposition would generate 10+ calls for no quality gain.
Leave as normal OW decomposition: reasoning fields where depth matters, fields with selectFields chains, and fields you want to route to different models or temperatures.
The rule of thumb: if a field is about finding things → structured output. If it's about thinking about things → orchestration.
What Shipped
Three things:
structuredOutput flag on Definition — any Object or Array node can opt in. The processor converts the subtree to a JSON Schema and sends one constrained API call.
Automatic fallback — if the structured call fails (context window, unsupported schema), OW falls back to normal decomposition silently.
Prompt prefix reordering — the shared input text now leads the prompt, enabling automatic prefix caching across sibling field calls. This benefits all OW definitions, not just hybrid ones.
Existing definitions without structuredOutput behave exactly as before.
