Where Orchestration Wins: Document Analysis Benchmarks and the Right Tool for the Job

April 20, 2026 · 6 min read

AI Infrastructure Platform

In our previous post, structured output beat ObjectWeaver on knowledge graph extraction. We promised a follow-up testing where orchestration actually shines. Here are the results.

The Hypothesis

ObjectWeaver should outperform single-pass structured output when the task has independent subtrees, progressive refinement, rich per-field output, and heterogeneous complexity. We designed a document analysis benchmark to test this directly.

Experiment Design

The document: A 15,000-character legal brief (which is fake) — a motion for summary judgment in a $3.1 billion failed merger case. The document contains parties, dates, financial transactions, legal citations, four distinct legal arguments, and supporting evidence.

The schema: Eight top-level fields across four complexity tiers:

Tier	Fields	Complexity	Dependencies
1 — Simple extraction	`metadata`, `parties`, `key_dates`	Low — pattern matching	None (independent)
2 — Medium analysis	`sections`, `legal_citations`, `financial_transactions`	Medium — summarisation + classification	Uses Tier 1 via `selectFields`
3 — Complex reasoning	`risk_assessment`	High — legal analysis + risk scoring	Uses Tiers 1 + 2
4 — Synthesis	`strategic_brief`	High — executive briefing	Uses Tier 3

ObjectWeaver: processingOrder for tier dependencies, selectFields injecting prior results into later fields, independent fields within each tier processing concurrently. Model: Gemini 2.5 Flash Lite.

Structured API: Single call with a JSON Schema covering all eight fields, one pass. Model: Gemini 2.5 Flash (the structured API requires the full model, not lite).

Both used temperature 0 and identical field descriptions.

Results

Performance

Metric	Structured API	ObjectWeaver	Difference
Duration	54s	15s	OW 3.6× faster
Prompt tokens	3,755	~1.01M (estimated)	Structured 270× fewer
Output tokens	6,170	Not tracked (OW stub)	—
API calls	1	~228	—
Errors	0	0	—

ObjectWeaver was 3.6× faster despite 228 API calls versus one. Independent fields — metadata, parties, key_dates — processed simultaneously, and within each array all items ran in parallel.

Output Quality

Field by field:

Field	Structured (items / words)	OW (items / words)	OW advantage
metadata	6 / 30	6 / 34	Tie (OW formatted dates correctly)
parties	7 / 202	9 / 326	+2 parties, +61% detail
key_dates	17 / 345	17 / 2,196	Same count, 6.4× more detail
sections	5 / 501	4 / 711	−1 section, +42% depth
legal_citations	10 / 384	11 / 971	+1 citation, 2.5× more analysis
financial_transactions	7 / 305	7 / 405	Same count, +33% detail
risk_assessment	5 fields / 397	5 fields / 1,052	2.6× more reasoning, 9 vs 4 vulnerabilities
strategic_brief	5 / 426	5 / 484	+14% more detail
Total	2,590 words	6,179 words	OW 2.4× more content

What the Numbers Mean

The structured API compressed its analysis. 6,170 output tokens across 8 fields is ~770 tokens per field. The risk assessment got 4 vulnerabilities and 397 words. The model had more to say but ran out of budget.

ObjectWeaver gave each field the full output window. Each field is a separate call — the risk assessment could run for thousands of tokens independently. It found 9 vulnerabilities and produced 2.6× more reasoning, not competing for budget with metadata extraction.

key_dates is the starkest example: both found 17 dates, but OW produced 6.4× more words per date. The structured API gave bare-bones descriptions to stay within budget.

risk_assessment received extracted metadata, party names, section assessments, citation principles, and transaction amounts as context — grounded in specific facts already extracted, not re-reading the full document and hoping for consistency. strategic_brief then received the risk assessment's conclusions directly, letting its settlement and trial risk sections reference specific strength scores and vulnerabilities from the tier before.

Comparing Both Experiments

Dimension	Graph Extraction	Document Analysis
Winner	Structured API	ObjectWeaver
Key factor	Cross-field referential integrity	Per-field output depth
Structured API advantage	Entity IDs consistent across all rels	Single coherent response
OW advantage	N/A (OW lost)	3.6× faster, 2.4× more content
Cross-field refs needed?	Yes — relationships must cite entities	Minimal — tiers are progressive
Output per field	Small (an ID, a name, a type)	Large (paragraphs of analysis)
Independent fields?	No — rels depend on entities	Yes — Tier 1 is fully independent

Structured output wins when fields reference each other tightly. Orchestration wins when fields need independent depth.

When to Use ObjectWeaver

Rich per-field analysis. When each field deserves thorough, multi-paragraph output — document analysis, compliance reviews, due diligence, medical record summarisation. If your fields contain words like "analysis", "assessment", or "recommendation", they benefit from dedicated attention.

Independent subtrees. When most fields don't depend on each other. If removing one field wouldn't break any other, those fields are independent and will benefit from parallel processing.

Progressive reasoning chains. When your workflow is extract → analyse → synthesise. If your schema has a natural tier structure where later fields need earlier results, processingOrder + selectFields creates an inspectable chain-of-thought.

Large aggregate output. If your expected total output exceeds ~6,000 tokens, a single structured call will start compressing detail. ObjectWeaver's per-field calls have no aggregate limit.

Mixed model requirements. If you'd use different models or temperatures for different parts of your schema, OW is the only option that supports this.

When NOT to Use ObjectWeaver

Cross-field referential integrity. If field B must reference exact values from field A (entity IDs, foreign keys across arrays), single-pass coherence is unbeatable.

Simple, flat schemas. A handful of uniform fields doesn't justify orchestration overhead.

Cost-sensitive batch processing. OW uses ~100-270× more input tokens than a single structured call. If you're processing thousands of documents and depth isn't critical, structured output is dramatically cheaper.

The Token Economics

ObjectWeaver's input cost scales roughly as: Total Input Tokens ≈ N(fields) × T(document). For this benchmark: ~228 calls × ~4,400 tokens ≈ 1M input tokens vs. the structured API's 3,755 — a 270× multiplier. You pay in input tokens; you get back output depth, parallelism, and reasoning quality. Whether that trade-off makes sense depends on your use case.

What's Next

These two experiments establish the boundaries. We're now focused on surfacing OW's actual token tracking, hybrid approaches (structured output for coherent extraction, OW for progressive reasoning on top), and model routing benchmarks to test quality-per-dollar gains from routing simple fields to flash-lite.

Neither approach is universally better. The skill is knowing which one your schema needs.

The Hypothesis​

Experiment Design​

Results​

Performance​

Output Quality​

What the Numbers Mean​

Progressive Refinement Worked​

Comparing Both Experiments​

When to Use ObjectWeaver​

When NOT to Use ObjectWeaver​

The Token Economics​

What's Next​