Skip to main content

When Structured Output Beats Orchestration — And When It Doesn't

· 4 min read
ObjectWeaver Team
AI Infrastructure Platform

We ran an honest experiment comparing ObjectWeaver against the Gemini Structured Output API for knowledge graph extraction from interview transcripts. The structured API won. This post explains why — and where orchestration actually earns its keep.

The Experiment

The task: extract a typed entity-relationship graph from legal market intelligence interviews — lawyer movements, client relationships, case work, practice area expertise. 7 entity types, 10 relationship types. Same prompts, same ontology, same ~2,000-word synthetic interview split into 7 chunks, same model (Gemini 2.5 Flash Lite).

ObjectWeaver: entities extracted as an array field, then 10 relationship arrays with selectFields injecting entity data and processingOrder enforcing sequencing. Each field and array item is a separate LLM call — roughly 47 calls per chunk.

Structured API: one JSON Schema covering the full entity + relationship structure. One call per chunk — 7 total.

Results

MetricObjectWeaverGemini Structured API
Entities95116
Relationships111187
Referential integrity~75%100%
Duration28s40s
API calls~329 (47 × 7 chunks)7
Estimated cost~$0.01–0.02~$0.01

The structured API produced 68% more relationships with perfect referential integrity. ObjectWeaver dropped roughly 25% of its relationships during postprocessing — the LLM generated entity names that didn't match the extracted list.

Why the Structured API Won

Cross-field coherence. The structured API generates entities and relationships in one response. The model holds the full entity list in working memory while writing relationships. ObjectWeaver processes each field as a separate call — relationship fields receive entity names via selectFields, but each source/target is extracted independently, losing holistic context.

Schema-level enforcement. The Gemini Structured API constrains output at the token level — the model can't produce malformed JSON or invalid enum values. ObjectWeaver relies on prompt instructions ("ONLY use exact entity names"), which the model frequently ignores, with fuzzy matching in postprocessing to recover what it can.

Natural deduplication. In a single structured call, the model sees all entities it has already written. ObjectWeaver's entity array generates each item separately — the model can't see prior extractions, producing variants like Freshfields and Freshfields Bruckhaus Deringer as distinct entries.

Token efficiency. 7 API calls vs. 329. Each ObjectWeaver call resends the system prompt, chunk text, and injected context. More tokens, worse output.

What Orchestration Is Actually For

This result doesn't mean orchestration is pointless — it means graph extraction is the wrong benchmark for it. Tight cross-field coherence is exactly what single-pass structured output is built for.

ObjectWeaver's field-level decomposition is designed for different problems:

Exceeding output token limits. A single structured API call produces at most a few thousand tokens. When extracting hundreds of entities from a long document, the response truncates. ObjectWeaver's decomposition means each item is a separate call — total output size is unbounded.

Mixed model requirements. A simple classification field and a nuanced legal analysis field shouldn't use the same model at the same temperature. ObjectWeaver routes each field independently.

Progressive multi-step reasoning. When later fields genuinely depend on earlier analysis — "given the extracted entities, assess their litigation risk" — processingOrder + selectFields creates structured chain-of-thought with inspectable intermediate results.

Heterogeneous schemas. Independent subtrees (summary, sentiment, key quotes, action items) can process concurrently with different configurations. A single structured call forces everything through one pass.

The Honest Position

For extracting a coherent graph from a document that fits in a single context window: use structured output. Simpler, cheaper, better results.

For schemas where different parts need different models, output exceeds token limits, fields need progressive refinement, or subtrees are genuinely independent — that's where orchestration earns its keep.

We're running a follow-up experiment designed to test those conditions directly.

Next Experiment: Heterogeneous Document Analysis

To test where orchestration actually outperforms single-pass structured output, we're designing an experiment with a schema that favours it:

  • Large, heterogeneous output — thousands of tokens across diverse field types, past single-call limits
  • Mixed complexity — some fields need simple classification (cheap model), others need nuanced reasoning (expensive model)
  • Independent subtrees — fields that don't reference each other, enabling parallel processing
  • Progressive refinement — later fields that depend on earlier output, testing selectFields + processingOrder chains

The task: comprehensive analysis of a long-form legal brief (~5,000–10,000 words). The schema:

  1. Metadata extraction (simple, cheap): document type, jurisdiction, date references, parties
  2. Section summaries (medium): key arguments, cited precedents, statutory references
  3. Risk assessment (complex, expensive): argument strength, likelihood of success, strategic vulnerabilities — using metadata and summaries as context via selectFields
  4. Comparable case analysis (complex, depends on risk assessment): similar cases, predicted outcomes
  5. Executive summary (medium, depends on all above): synthesised structured briefing

We'll measure the same metrics — accuracy, referential integrity between dependent fields, cost, latency, and output completeness — and report the results honestly.

Stay tuned.