Building an AI Review Article Writer: Creating the Skeleton

With our multi-agent architecture established, the next critical challenge is creating the structural foundation that will guide the entire writing process. Before generating any content, the system must understand the research landscape and create a logical organization that serves both the topic and target audience.

Why Structure Must Come First

The fundamental challenge in automated research synthesis isn’t gathering information—it’s organizing it meaningfully. Without proper structure, even the most comprehensive research becomes an incoherent collection of facts and citations.

Consider what distinguishes a mediocre literature review from an excellent one. It’s rarely the amount of research or even the quality of individual sources. Instead, it’s how the author has organized the knowledge landscape to reveal insights, contradictions, and patterns that weren’t immediately obvious.

For an AI system, this structural challenge manifests in three critical dimensions:

Knowledge Architecture: How do you identify the conceptual frameworks that best organize the field?

Audience Adaptation: How do you structure information to match your readers’ existing knowledge and needs?

Resource Constraints: How do you balance comprehensive coverage with practical limits on scope and depth?

Each dimension requires different capabilities and creates different types of decisions that must be made before any writing begins.

Phase 1: TOC Generation

The Table of Contents generation is where our first reactive agent shines. This agent must:

Search the web for relevant information about the topic
Understand the current state of research
Identify logical sections and subsections
Structure content appropriately for the target audience
Stay within page limits while ensuring comprehensive coverage

The Research Strategy

The fundamental challenge in TOC generation lies in balancing comprehensive coverage with practical constraints. Unlike human researchers who might spend weeks exploring a field, the AI system must quickly develop a sophisticated understanding of both the knowledge landscape and how to organize it effectively for the intended audience.

The research strategy embodies several key insights about how expert researchers actually approach unfamiliar domains:

def toc_generator():
    tools = get_mcp_tools(os.getenv('TOC_SEARCH_TOOLS', 'tavily').split(','))
    graph = create_reactive_graph(
        prompt='generate table of contents for the review article using search tools.',
        system_prompt=TOC_GENERATOR_PROMPT,
        structured_output_schema=TableOfContents,
        tools=tools,
        # ...
    )

This approach reflects a crucial design philosophy: rather than trying to exhaustively map every corner of a field, the system focuses on identifying the essential organizational principles that will guide readers through the material effectively.

The system prompt encapsulates years of experience about what makes literature reviews successful:

**SEARCH REQUIREMENTS:**

- Use EXACTLY 2 tool calls with available tools like arxiv, ddg, tavily, brave, etc
- Focus on recent landmark studies and reviews for {audience}
- Maximize coverage, avoid redundancy

**STRUCTURE (STRICT):**

- 4-10 top-level sections ({max_pages} ÷ 2.5 pages per section)
- Maximum 2-4 subsections per section
- Each subsection must be at least 200 words
- Sub-subsections only when essential (minimal use)
- NO "References", "Bibliography", or "Abstract" sections

The Psychology of Constraint-Driven Research

Each constraint serves a specific purpose based on understanding how both writers and readers actually engage with review articles:

Focused Research: The limit of exactly 2 tool calls might seem arbitrary, but it reflects a crucial insight about information gathering under time pressure. Unlimited search often leads to analysis paralysis—the more sources you examine, the harder it becomes to synthesize them into coherent themes. By forcing rapid decision-making about what to search for, the system mimics how expert researchers quickly identify the most promising information sources.

Balanced Coverage: The mathematical relationship between page limits and section counts (max_pages ÷ 2.5) reflects empirical observation about readable article structure. Sections shorter than 2-3 pages feel superficial, while sections longer than 4-5 pages become unwieldy. This constraint ensures the AI doesn’t create structures that would be difficult for human readers to navigate effectively.

Practical Granularity: The 200-word minimum for subsections prevents the over-fragmentation that often plagues AI-generated content. When systems are free to create arbitrary numbers of subsections, they tend to produce shopping-list structures that lack the depth necessary for genuine understanding.

Clean Structure: Excluding administrative sections like “References” and “Abstract” keeps the focus on substantive content organization. These elements are handled separately in the production pipeline, ensuring the TOC reflects only the intellectual structure of the review itself.

Search Tool Integration

Different research domains require different search strategies, and the system’s tool integration reflects the reality that no single search engine provides optimal coverage across all types of knowledge. The diversity of search tools available through the MCP (Model Context Protocol) framework enables the system to adapt its research approach to match the specific characteristics of each topic:

Tavily: Provides web search optimized for research contexts, with built-in quality filtering that prioritizes authoritative sources over popular content
ArXiv: Essential for scientific and technical topics, offering access to cutting-edge research that might not appear in general web searches for months or years
DuckDuckGo: Serves as a reliable fallback with broad web coverage and minimal algorithmic filtering
Brave Search: Offers an independent search index that can surface sources not prioritized by other engines

Strategic Search Query Construction

The system doesn’t simply search for the topic name—it constructs queries that reflect strategic thinking about how to discover organizational frameworks:

# Example search execution
search_results = await tavily_search.ainvoke({
    "query": f"quantum computing review article structure {audience} 2024"
})

# Process and synthesize results
toc_structure = synthesize_search_results(search_results, constraints)

This query construction strategy embodies several insights about effective literature discovery:

Explicit Structural Intent: Including “review article structure” in the search query specifically targets sources that have already solved similar organizational challenges. Rather than searching for raw information about quantum computing, the system looks for examples of how experts have structured comprehensive overviews.

Audience-Aware Discovery: The inclusion of the target audience in search queries recognizes that the same topic requires fundamentally different organizational approaches when written for different readers. A quantum computing review for “industry professionals” will surface different structural models than one for “graduate students” or “general public.”

Temporal Focusing: Adding “2024” to queries reflects the understanding that research organization evolves as fields mature. The conceptual frameworks that made sense for organizing quantum computing knowledge in 2020 may be outdated as the field has progressed.

Multi-Source Synthesis Strategy

The real intelligence in search tool integration lies not in any individual search, but in how the system synthesizes information from multiple sources to identify robust organizational patterns. Each search tool provides a different lens on the knowledge landscape:

Academic tools like ArXiv reveal how researchers structure their thinking within disciplinary boundaries. General web search tools like Tavily and DuckDuckGo show how the broader intellectual community has organized and discussed these topics. By comparing organizational patterns across these different perspectives, the system can identify structures that are both intellectually rigorous and broadly accessible.

Structured Output Generation

The reactive agent produces structured output using the TableOfContents schema:

class Section(BaseModel):
    title: str
    subsections: Optional[List[Union[Subsection, str]]] = Field(default_factory=list)

class TableOfContents(BaseModel):
    toc: List[Section] = Field(description='List of sections for the table of contents')

This structure allows for:

Nested Organization: Sections can contain subsections
Flexible Granularity: Subsections can be simple strings or complex objects
Easy Validation: Pydantic ensures structural correctness

Example TOC Generation

For a topic like “Quantum Computing Applications in Finance”, the agent might produce:

1. Introduction to Quantum Computing in Finance
   - Quantum Advantage in Financial Computing
   - Current State of Quantum Hardware

2. Quantum Algorithms for Financial Optimization
   - Portfolio Optimization
   - Risk Analysis and Monte Carlo Methods
   - Quantum Machine Learning Applications

3. Cryptography and Security Implications
   - Post-Quantum Cryptography
   - Quantum Key Distribution

4. Industry Implementation and Case Studies
   - Goldman Sachs Quantum Initiative
   - IBM Quantum Network Financial Applications

5. Challenges and Limitations
   - Hardware Constraints
   - Error Correction Requirements

6. Future Directions and Timeline
   - Near-term Applications
   - Long-term Quantum Advantage

Phase 2: Human Feedback Loop

Not all generated TOCs are perfect. The system includes a human-in-the-loop validation step:

def human_toc_approval(state: ReviewWriterState) -> Command[Literal['refine_toc', 'plan_writer']]:
    if not state.human_feedback:
        return Command(goto='plan_writer')

    toc_for_display = [section.model_dump() for section in state.toc]
    decision = interrupt({'question': 'Do you approve the following TOC?', 'toc': toc_for_display})

    if len(decision_text.lower()) < 10:
        return Command(goto='plan_writer')
    else:
        return Command(goto='refine_toc', update={'feedback': decision_text})

This feedback mechanism:

Shows Human-Readable TOC: Converts internal structure to display format
Collects Specific Feedback: User can provide detailed revision requests
Routes Appropriately: Short responses approve, longer ones trigger refinement
Can Be Bypassed: human_feedback=False skips this step for automated runs

When feedback is provided, the refine_toc agent:

async def refine_toc(state: ReviewWriterState) -> ReviewWriterState:
    llm = get_llm(model_type='main')
    llm_with_structured_output = llm.with_structured_output(TableOfContents)

    messages = [
        SystemMessage(content=TOC_REFINE_PROMPT.format(
            toc=json.dumps([section.model_dump_json() for section in state.initial_toc]),
            topic=state.topic,
            audience=state.audience,
            max_pages=state.max_pages,
            instructions=state.instructions,
        )),
        HumanMessage(content=f'Based on the feedback, {state.feedback}\nRefine the final table of contents')
    ]

    response = await llm_with_structured_output.ainvoke(messages)
    return {'toc': response.toc}

The refinement process:

Considers original research and constraints
Incorporates human feedback specifically
Maintains structural requirements
Produces a revised, structured TOC

Phase 3: Resource Planning

Once the TOC is approved, the system must plan resource allocation:

async def plan_writer(state: ReviewWriterState) -> ReviewWriterState:
    llm = get_llm(model_type='main')
    llm_with_structured_output = llm.with_structured_output(ReviewPlan)

    # Calculate word counts
    total_words = state.max_pages * 350
    content_words = int(total_words * 0.9)  # 90% for content, 10% for formatting

    toc_titles = [section.title for section in state.toc]

    messages = [
        SystemMessage(content=REVIEW_PLANNER_PROMPT.format(
            topic=state.topic,
            audience=state.audience,
            max_pages=state.max_pages,
            toc=json.dumps(toc_titles),
            total_words=total_words,
            content_words=content_words,
            special_instructions=state.instructions,
        ))
    ]

    response = await llm_with_structured_output.ainvoke(messages)
    return {'plan': response.plan}

Word Count Mathematics

The transition from conceptual structure to practical writing constraints requires careful resource planning that reflects the realities of academic publishing. The mathematics involved aren’t arbitrary—they’re based on empirical observations about readable article structure and the practical constraints of LaTeX document production.

The Science of Readable Density

The planning phase uses calculations rooted in research about cognitive load and reading comprehension:

Page Estimation: The 350 words per page standard reflects the optimal density for academic reading. This isn’t simply a typographical constraint—it’s based on studies of how readers process complex technical information. Dense academic formatting (smaller fonts, narrow margins, technical terminology) requires lower word density than popular writing to maintain comprehensibility.

Content Ratio: The 90% content, 10% formatting overhead allocation acknowledges that LaTeX documents include significant structural markup that doesn’t contribute to word count but does affect final page length. Mathematical equations, figures, tables, and citation formatting all consume space without adding to the raw word count. This buffer ensures that the final document meets page requirements regardless of the complexity of its formatting.

Section Allocation: The distribution of words across sections involves strategic thinking about information hierarchy and reader attention patterns. Earlier sections typically receive larger allocations because they must establish context and background. Middle sections often have variable allocations based on topic complexity. Final sections may receive smaller allocations focused on synthesis and conclusions.

Resource Planning in Practice

For a 20-page quantum computing review, the mathematical approach translates abstract planning into concrete writing targets:

Total budget: 20 × 350 = 7,000 words (establishes the absolute constraint)
Content budget: 7,000 × 0.9 = 6,300 words (accounts for LaTeX overhead)
Example allocation: [1,200, 1,000, 800, 1,100, 900, 1,300] across 6 sections

This allocation reflects strategic priorities: the introduction (1,200 words) must establish sufficient context for the target audience. Core technical sections (1,000-1,100 words) receive substantial space for detailed explanation. Transitional sections (800 words) provide necessary coverage without overwhelming readers. The conclusion (1,300 words) gets significant space for synthesis and future directions—often the most valuable part of review articles for readers planning their own work.

These constraints aren’t limitations—they’re enabling constraints that force writers (human or AI) to make deliberate choices about what matters most. Without resource planning, content generation tends to expand indefinitely or distribute attention randomly across topics.

The ReviewPlan Schema

class ReviewPlan(BaseModel):
    plan: List[int] = Field(
        description='List of word counts for each section in the TOC, in the same order'
    )

This simple schema ensures:

Exact Correspondence: One word count per TOC section
Budget Enforcement: Total doesn’t exceed content budget
Flexible Allocation: Different sections can have different priorities

Integration with Section Writing

The skeleton directly drives the section writing process:

def start_write_sections_sequential(state: ReviewWriterState) -> SectionWriterState:
    return {
        'topic_extracted': state.topic,
        'target_audience': state.audience,
        'toc': state.toc,                    # Structure guide
        'plan': state.plan,                  # Word count budgets
        'section_title': state.toc[0].title,
        'current_section_index': 0,
        'special_instructions': state.instructions,
    }

The section writer receives:

Structural Guidance: Exact outline to follow
Resource Constraints: Word counts per section
Context: Topic, audience, special instructions

Quality Considerations

The skeleton creation process incorporates several quality measures:

Research Quality

Source Diversity: Multiple search tools for broader perspective
Recency Bias: Focus on recent developments and reviews
Audience Appropriateness: Structure matches target audience needs

Structural Quality

Logical Flow: Sections build on each other
Balanced Coverage: No single section dominates
Practical Granularity: Subsections are substantial enough to be meaningful

Planning Quality

Resource Realism: Word counts match section complexity
Buffer Management: 10% overhead for LaTeX formatting
Constraint Adherence: Total stays within page limits

Caching Strategy

Given the computational cost of web search and LLM processing, the skeleton generation is heavily cached:

workflow.add_node(
    'toc_generator',
    toc_generator(),
    cache_policy=CachePolicy(ttl=cache_ttl, key_func=cache_key_function),
)

The cache key function focuses on semantic content:

Topic, audience, and page limits
Special instructions
Search tool configuration

This means identical requests (common in development) return immediately, while semantically different requests trigger new research.

Configuration and Flexibility

The skeleton generation is highly configurable:

# Search tools selection
export TOC_SEARCH_TOOLS="tavily,arxiv,ddg"

# Feedback control
export HUMAN_FEEDBACK="true"

# Cache settings
export CACHE_TTL="86400"

This allows:

Environment Adaptation: Different tools for different deployments
Automation Control: Disable human feedback for batch processing
Performance Tuning: Adjust cache duration for development vs. production

Common Failure Modes and Mitigation

The skeleton generation process can encounter several failure modes:

Search Failures: Multiple search tools provide redundancy, and generic knowledge can create basic TOC structure as fallback.

TOC Too Granular: Word count calculations reveal over-fragmentation, which the refinement step can consolidate sections to address.

Human Rejection Loops: Multiple refinement cycles can be detected and mitigated through maximum attempt limits with forced progression.

The Execution Challenge

Having a well-structured foundation is essential, but it raises an even deeper question: what does it take for an AI system to act autonomously on that structure while maintaining the quality and rigor expected in serious research synthesis?

The gap between having a good plan and executing it well is where most automated content generation systems fail. They can follow templates and fill in sections, but they struggle with the dynamic decision-making that separates mechanical summarization from genuine synthesis.

This is where the problem becomes truly interesting—not just organizing information, but developing the capacity for autonomous research judgment that can turn structured plans into substantive, original contributions to knowledge.

Next Up

With the skeleton complete, the system has:

Researched structure based on current knowledge
Human-validated organization
Resource allocation plan

In our next post, we’ll dive deep into the reactive agent pattern that powers much of this functionality - exploring how it manages tool interactions, context windows, and structured output extraction. The reactive agent is the workhorse that makes complex web research and synthesis possible.

The skeleton phase demonstrates how proper research and planning create the foundation for everything that follows - much like how a good outline guides human writers to create coherent, comprehensive articles.

# Building an AI Review Article Writer: Creating the Skeleton