From Academic Research to Personal News: Building an Intelligent News Agent

After building an AI system that could autonomously research and write academic reviews, I found myself facing a different but related challenge: the overwhelming flood of daily information that makes staying informed feel like drinking from a fire hose.

The reactive agent pattern that powered academic research synthesis seemed perfectly suited to tackle this problem—but with a crucial twist. Instead of researching unknown topics to write comprehensive articles, I needed an agent that could continuously monitor dozens of personally curated topics and intelligently filter the signal from the noise.

This is the story of building a news agent that transforms information overload into personalized intelligence. You can see the live system in action at news.reckoning.dev.

The Information Paradox We All Face

Modern information consumption creates a fundamental paradox: the more sources you follow to stay well-informed, the less time you have to actually think about what you’ve learned.

Consider my daily information diet across technology, AI research, global markets, space exploration, health research, and climate developments. Each topic requires monitoring multiple sources—academic papers, industry news, market reports, company announcements. The challenge isn’t finding information; it’s finding the right information while avoiding the cognitive overhead of processing hundreds of duplicate, outdated, or irrelevant stories.

The Curation vs. Coverage Dilemma: Manual curation produces high-quality, personalized results but doesn’t scale to comprehensive coverage. Algorithmic feeds provide broad coverage but often miss nuanced relevance and suffer from engagement-driven bias rather than information quality optimization.

The Context Collapse Problem: Most news aggregation treats all information as equally urgent and contextless. A breakthrough in quantum computing research deserves different treatment than the latest tech company acquisition, yet both appear as identical headlines in your feed.

The Recency Bias Trap: Traditional news systems prioritize recent information over important information, creating a bias toward breaking news that may be ultimately less significant than slower-developing stories with deeper implications.

Why the Reactive Agent Pattern Fits News Curation

The reactive agent architecture solves information curation challenges through its core capabilities:

Adaptive Search Strategy: Instead of running identical queries across all topics, the agent can adjust search depth, recency filters, and source selection based on the topic’s characteristics. Fast-moving political developments need different treatment than slower-evolving scientific research.

Tool Orchestration: Multiple search engines provide different strengths—academic databases for research papers, news APIs for breaking developments, social media for emerging trends. The reactive pattern enables intelligent tool selection based on information needs.

Context-Aware Processing: The agent maintains awareness of what information it has already processed, enabling intelligent deduplication and preventing the endless recycling of similar stories across different sources.

Synthesis Capabilities: Rather than just aggregating headlines, the agent can synthesize related stories, identify patterns across topics, and highlight connections that might not be obvious from individual articles.

The News Agent Architecture: Information Intelligence at Scale

The news agent adapts the reactive pattern to handle the unique challenges of continuous information processing:

graph TD;
    A[Topics Configuration] --> B[Query Generation];
    B --> C[Multi-Source Search];
    C --> D[Content Collection];
    D --> E[Embedding Analysis];
    E --> F[Deduplication Engine];
    F --> G[Intelligent Synthesis];
    G --> H[Personalized Output];
    I[Search Engines] --> C;
    J[Academic Sources] --> C;
    K[News APIs] --> C;
    E --> L[Ollama BGE-Large];
    F --> M[Similarity Detection];
    M --> N[Content Merging];

This architecture reflects a fundamental insight: news curation is not just information retrieval—it’s knowledge synthesis that requires understanding semantic relationships between different pieces of information.

Personalization Through Structured Intent: The Topics.yaml Philosophy

The heart of personalization lies not in algorithmic preference learning but in explicit intent declaration. Rather than trying to infer what I care about from my behavior, the system uses a structured configuration that declares my interests explicitly.

Here’s my complete topics.yaml configuration that drives the personalization:

US:
  groups: ['US']
  news: []
UK, EU, Russia, China, India, Brazil, South America, Africa, Australia, New Zealand, Canada, Mexico, World:
  groups: ['World']
  news: []
India:
  groups: ['India', 'World']
  news: []
AI, Gemini, Anthropic, OpenAI, GenAI, Alibaba, Agents, Nvidia, Llama, Qwen, Deepseek:
  groups: ['Technology', 'AI']
  news: []
Reinforcement Learning, Deep Learning, Machine Learning, Neural Networks, Transformers:
  groups: ['AI', 'Technology']
  news: []
Generative Pre-trained Transformers (GPT), Large Language Models (LLMs):
  groups: ['Technology', 'AI']
  news: []
Object Detection, Object Segmentation, Object Tracking, Object Recognition, Object Classification:
  groups: ['Technology', 'AI']
  news: []
Image Generation, Multi-Modal, Inference, Fine-tuning, Prompt Engineering:
  groups: ['Technology', 'AI']
  news: []
Robotics:
  groups: ['Technology']
  news: []
Blockchain and Quantum Computing:
  groups: ['Technology']
  news: []
Hardware and Software:
  groups: ['Technology']
  news: []
Medical Imaging, Devices and Robotics:
  groups: ['Health', 'Technology']
  news: []
Apple, Google, Microsoft, Meta, Amazon and other Tech Companies:
  groups: ['Technology']
  news: []
Space Programs, SpaceX, NASA, ISRO, ESA, China Space Program, Japan Space Program:
  groups: ['Technology', 'Space']
  news: []
DNA, RNA, Protein, Genomics, Genetic Engineering:
  groups: ['Science', 'Technology']
  news: []
Mobile OS, iOS and Android, Mobile Apps, Mobile Games, Mobile Development:
  groups: ['Technology']
  news: []
Windows OS, Linux, MacOS, ChromeOS, Search Engines, Browsers, Operating Systems:
  groups: ['Technology']
  news: []
Fitness, Wellness, Nutrition, Diet, Exercise, Sleep, Stress, Health:
  groups: ['Health']
  news: []
Vaccines, Biotechnology, Genomics and Genetic Engineering:
  groups: ['Science']
  news: []
Semiconductor Manufacturing, Packaging, Testing:
  groups: ['Technology']
  news: []
Intel, AMD, Nvidia, TSMC, Samsung, Micron, SK Hynix, Qualcomm, MediaTek:
  groups: ['Technology']
  news: []
ARM, Qualcomm, MediaTek, Samsung, Apple:
  groups: ['Technology']
  news: []
Cryptocurrency:
  groups: ['Business', 'Finance']
  news: []
Economy and Markets:
  groups: ['Business', 'Finance']
  news: []
Tech Companies Business:
  groups: ['Business', 'Technology']
  news: []
FDA and Health care companies:
  groups: ['Business', 'Health']
  news: []
US Stock Market:
  groups: ['Business', 'Finance']
  news: []
Indian Stock Market - NSE, BSE, Nifty, Sensex:
  groups: ['Business', 'Finance']
  news: []
UK and Europe Stock Market:
  groups: ['Business', 'Finance']
  news: []
China, Japan, South Korea and Taiwan Stock Market:
  groups: ['Business', 'Finance']
  news: []
Python, C++, Rust, Go, Programming Languages, JavaScript, HTML, CSS, React, Angular, Vue, Web Development:
  groups: ['Technology']
  news: []
Climate change, Sustainability, Solar, Wind, Hydrogen, Renewable Energy, Fossil Fuels:
  groups: ['Science', 'Technology', 'Climate']
  news: []
CO2 emissions, Carbon capture, sequestration, offset, Carbon-neutral, Carbon-footprint:
  groups: ['Science', 'Technology', 'Climate']
  news: []

Multi-Topic Granularity: Each entry can specify multiple related topics, enabling nuanced coverage that reflects how real interests overlap. “AI” and “Generative AI” are related but distinct, and the system can search for both broad developments and specific breakthroughs.

Group-Based Intelligence: Topics are organized into groups that determine processing behavior:

Politics: 2-day retention (fast-moving, high temporal relevance)
Technology: 4-day retention (moderate pace, building trends)
Science/Health: 7-day retention (slower evolution, deeper significance)

Explicit Rather Than Inferred: This approach avoids the “filter bubble” problem of algorithmic recommendation systems. Instead of inferring preferences from behavior, it uses explicit declarations that can be consciously updated as interests evolve.

Query Multiplication Strategy: The system generates multiple search queries by combining individual topics with group contexts and temporal modifiers (“breaking news,” “recent developments,” “latest research”), creating comprehensive coverage without redundant processing.

Search Engine Orchestration: The Right Tool for the Right Information

Different types of information require different search strategies. The news agent orchestrates multiple search engines based on information characteristics:

Brave Search: Optimized for recent web content with configurable freshness filters. Essential for breaking news and current events where recency determines relevance.

Tavily: AI-optimized research search with news-specific modes. Excellent for finding comprehensive coverage of developing stories across multiple sources.

ArXiv: Academic paper search for scientific developments. Critical for understanding the research foundation behind technology announcements.

Wikipedia: Contextual information for understanding background and connections between topics.

graph LR;
    A[Search Query] --> B{Topic Classification};
    B -->|Breaking News| C[Brave Search];
    B -->|Research Topic| D[ArXiv Search];
    B -->|General Coverage| E[Tavily Search];
    B -->|Background Context| F[Wikipedia Search];
    C --> G[Results Aggregation];
    D --> G;
    E --> G;
    F --> G;
    G --> H[Relevance Scoring];
    H --> I[Unified Results];

Search Strategy Intelligence: The system doesn’t just run the same query across different engines—it adapts the query style and parameters based on each engine’s strengths and the information type being sought.

Rate Limit Management: Sophisticated handling of API limits through exponential backoff, request spacing, and graceful degradation ensures robust operation even under rate limiting conditions.

Multi-Provider Resilience: If one search provider fails or reaches limits, the system automatically falls back to alternative sources without losing coverage.

Fighting Information Redundancy: Semantic Deduplication at Scale

The most sophisticated challenge in news curation is intelligent deduplication. Simple string matching fails because the same story can be reported with completely different headlines, angles, and details across sources.

The Embedding Intelligence Solution: The system uses Ollama’s BGE-Large model to create semantic embeddings that capture meaning rather than just text:

def deduplicate_news_items(news_items: List[NewsItem], similarity_threshold: float = 0.95):
    # Combine title and summary for embedding
    combined_text = f"{item.title}\n{item.summary}"

    # Calculate semantic embeddings
    embeddings = embeddings_model.embed_documents(texts_to_embed)

    # Find similar items using cosine similarity
    for i, j in all_pairs:
        similarity = calculate_similarity(embeddings[i], embeddings[j])
        if similarity >= similarity_threshold:
            # Merge similar items intelligently
            merged_item = merge_news_items(item1, item2)

Semantic vs. Lexical Similarity: Traditional deduplication fails because “Apple announces new AI chip” and “Cupertino giant unveils machine learning processor” describe the same story with completely different words. Semantic embeddings capture the underlying meaning, enabling intelligent similarity detection.

Intelligent Merging Process: When similar articles are detected, a small language model synthesizes them into comprehensive summaries that preserve information from all sources while eliminating redundancy:

graph TD;
    A[Multiple Similar Articles] --> B[Semantic Analysis];
    B --> C[Similarity Detection >0.95];
    C --> D[LLM-Based Merging];
    D --> E[Comprehensive Summary];
    E --> F[Preserved Source Attribution];
    G[Article 1: Apple AI Chip] --> A;
    H[Article 2: Cupertino ML Processor] --> A;
    I[Article 3: Apple Silicon AI] --> A;

Quality Preservation: The merging process doesn’t just eliminate duplicates—it creates higher-quality summaries that synthesize insights from multiple perspectives while maintaining source attribution for verification.

Threshold Calibration: The 0.95 similarity threshold is calibrated to catch true duplicates while avoiding false positives that would merge genuinely different stories that happen to be related.

The Intelligence Emergence: From Information to Insight

What makes this system more than just an automated news aggregator is its capacity for synthesis—the emergence of insights that go beyond simple information collection.

Cross-Topic Pattern Recognition: By processing information across multiple domains simultaneously, the system can identify connections that single-topic monitoring would miss. AI developments influence semiconductor stocks; climate policies affect renewable energy research; geopolitical events impact space program funding.

Temporal Context Awareness: The system maintains awareness of information age and decay rates specific to each topic domain. Breaking political news has hours of relevance; fundamental research papers remain relevant for months or years.

Source Authority Recognition: Different types of claims require different levels of source authority. Market predictions from financial analysts deserve different treatment than the same predictions from general news sources.

Coverage Gap Detection: By monitoring multiple sources across topics, the system can identify when important stories are being under-reported or when coverage is becoming echo-chamber repetitive.

Configuration-Driven Adaptability: Scaling Personal Intelligence

The power of this approach lies in its configurability—the same reactive pattern adapts to completely different information needs through parameter adjustment rather than code changes:

# Fast-moving political coverage
politics_config:
  max_tool_calls: 5
  retention_days: 2
  sources: ['news', 'social']

# Deep research monitoring
science_config:
  max_tool_calls: 3
  retention_days: 7
  sources: ['arxiv', 'academic', 'news']

Behavioral Specialization: Different information domains require different agent behaviors—more aggressive search for time-sensitive topics, deeper analysis for research areas, broader coverage for general interest topics.

Cost-Performance Optimization: Use smaller models for routine processing and larger models for complex synthesis, optimizing cost without sacrificing quality where it matters.

Evolution Support: As information needs change, configuration updates enable behavior modification without system redesign.

Real-World Intelligence: How Information Becomes Understanding

Let me trace how the system processes a typical day of AI research developments:

Morning Search Orchestration

Agent: Processing 47 configured topics across 8 groups...
Search Strategy: AI developments (Brave + ArXiv), Market news (Tavily),
                Research papers (ArXiv), Climate tech (Brave + Academic)

Round 1: Brave Search "OpenAI GPT-5 breakthrough 2024"
        → 12 articles found
Round 2: ArXiv Search "large language models efficiency 2024"
        → 8 papers found
Round 3: Tavily Search "AI industry funding latest news"
        → 15 comprehensive sources found

Semantic Deduplication in Action

Embedding Analysis: 35 articles processed
Similarity Detection:
  - Articles 3,7,12 → GPT-5 announcement (similarity: 0.97)
  - Articles 18,22 → AI funding round (similarity: 0.94)
  - Articles 28,31,33 → Climate AI research (similarity: 0.96)

Intelligent Merge Results:
  → "OpenAI GPT-5 Announcement: Technical Capabilities and Industry Impact"
    (synthesized from 3 sources, 280 words, comprehensive coverage)
  → "AI Climate Research: New Models for Carbon Emission Prediction"
    (synthesized from 3 sources, 195 words, technical focus)

Synthesis Output

The final output provides not just information, but contextual intelligence:

AI, GPT, OpenAI:
  news:
    - title: 'OpenAI GPT-5 Announcement: Technical Capabilities and Industry Impact'
      summary: 'OpenAI announced GPT-5 with significant improvements in reasoning capabilities and multimodal processing. The model demonstrates enhanced performance on complex reasoning tasks while reducing computational requirements by 40% compared to GPT-4. Industry analysts predict this advancement will accelerate AI adoption across enterprise applications, particularly in scientific research and software development. Three major tech companies have already announced integration plans...'
      sources:
        [
          'techcrunch.com/openai-gpt5',
          'arxiv.org/paper/12345',
          'bloomberg.com/ai-funding',
        ]
      published_date: '2024-01-09'

From Information Overload to Personalized Intelligence

The news agent represents a fundamental shift from passive information consumption to active intelligence curation. Instead of being overwhelmed by information volume, I now receive personalized intelligence that:

Respects Cognitive Bandwidth: Each topic gets exactly the depth of coverage it deserves based on my declared priorities and the topic’s characteristics.

Eliminates Information Waste: Sophisticated deduplication means I never read the same story twice, even when it appears across dozens of sources.

Provides Synthesis Over Aggregation: Rather than just collecting headlines, the system provides understanding—synthesized insights that connect information across topics and sources.

Scales Personal Expertise: The system extends my ability to stay informed across dozens of specialized domains without sacrificing depth in any individual area.

Maintains Source Authority: Unlike social media feeds or algorithmic aggregators, every piece of information includes source attribution and authority assessment.

The Broader Implications: Reactive Intelligence for Personal Automation

This adaptation of the reactive agent pattern demonstrates a broader principle: sophisticated AI behaviors can be configured rather than programmed for diverse personal automation needs.

The same pattern could orchestrate:

Research monitoring for academic or professional domains
Market intelligence for investment decision-making
Technology landscape monitoring for strategic planning
Competitive intelligence for business development

Configuration Over Coding: The key insight is that many complex behaviors can be achieved through parameter tuning and configuration rather than custom development, making sophisticated AI automation accessible without deep technical implementation.

Personal AI Infrastructure: This represents a step toward personal AI infrastructure—systems that amplify individual cognitive capabilities rather than replacing human judgment with algorithmic decisions.

Looking Forward: The Evolution of Personal Intelligence

The news agent demonstrates how the reactive pattern can evolve from academic research synthesis to practical daily intelligence. But this is just one application of a broader principle: AI systems that enhance human intelligence rather than replacing human decision-making.

The next evolution might involve cross-domain synthesis—automatically identifying connections between developments across different areas of interest, providing the kind of interdisciplinary insight that requires human-level reasoning but benefits from machine-scale information processing.

The reactive agent pattern proves its versatility by adapting from academic research to personal intelligence—demonstrating how well-designed AI architectures can solve diverse information challenges through configuration rather than custom development.

Experience the live news curation system yourself at news.reckoning.dev.

# From Academic Research to Personal News: Building an Intelligent News Agent