Building Semantic Related Posts with Ollama and Astro: A Complete Guide

Traditional related posts systems rely on simple tag matching or publication dates, often missing the deeper semantic connections between content. In this comprehensive guide, we’ll build an intelligent related posts system that understands the meaning of your content using local AI embeddings with Ollama based bge-large embeddings.

Why Semantic Similarity?

Traditional related posts systems have significant limitations:

Tag-based matching only works if posts share exact tags
Title/description matching misses conceptual relationships
Time-based suggestions ignore content relevance entirely

Semantic similarity using embeddings solves these problems by:

Understanding conceptual relationships between topics
Finding connections across different writing styles
Discovering subtle thematic links
Working even when posts use different terminology

For example, a post about “Neural Networks” could be semantically related to “Deep Learning Fundamentals” even without shared tags.

System Architecture Overview

Our system combines multiple similarity signals with intelligent caching:

graph TD;
A[Blog Post] --> B[Content Extraction];
B --> C[Text Preprocessing];
C --> D{Cache Check};
D -->|Cache Hit| E[Load Cached Embeddings];
D -->|Cache Miss| F[Generate Embeddings];
F --> G[BGE-Large Model];
G --> H[Store in Cache];
H --> E;
E --> I[Similarity Calculation];
I --> J[Weighted Score];
J --> K[Related Posts];

Setting Up Ollama

First, install and configure Ollama with the BGE-Large model:

# Install Ollama (macOS)
brew install ollama

# Start Ollama server
ollama serve

# Pull the BGE-Large embedding model
ollama pull bge-large

The BGE-Large model is specifically designed for high-quality text embeddings and performs excellently for semantic similarity tasks.

Core Components

Our semantic similarity system consists of several interconnected components that work together to understand and match content. The architecture follows a modular design where each component has a specific responsibility, making the system maintainable and extensible.

1. Dependencies Setup

We’ll use LangChain’s Ollama integration for seamless embedding generation and text processing utilities. The @langchain/textsplitters package provides sophisticated text chunking algorithms that preserve semantic meaning across chunk boundaries.

Install the required packages:

npm install @langchain/ollama @langchain/textsplitters fs-extra
npm install -D @types/fs-extra

2. Type Definitions

Strong typing is crucial for building reliable AI systems. Our type definitions establish clear contracts for how embeddings are stored, cached, and used throughout the system. The PostEmbedding interface includes content hashing for cache invalidation, while RelatedPost captures the multi-dimensional similarity scores we’ll calculate.

Let’s define our core types:

export interface PostEmbedding {
  postId: string
  embeddings: number[][]
  timestamp: number
  contentHash: string
}

export interface EmbeddingCache {
  version: string
  embeddings: Record<string, PostEmbedding>
}

export interface RelatedPost {
  post: CollectionEntry<'posts'>
  tagScore: number
  timeScore: number
  embeddingScore: number
  totalScore: number
}

Embedding Generation

Embeddings are the foundation of our semantic similarity system. They transform human-readable text into high-dimensional vectors that capture semantic meaning. The BGE-Large model produces 1024-dimensional embeddings that excel at understanding relationships between concepts, even when expressed differently.

Content Extraction and Preprocessing

Content preprocessing is critical for generating high-quality embeddings. We need to extract the most semantically meaningful parts of each post while removing markdown syntax that could confuse the embedding model. Our approach combines frontmatter metadata (title, description, tags) with cleaned body content to create a comprehensive representation of each post.

The preprocessing pipeline removes markdown formatting, converts links to readable text, and normalizes whitespace while preserving the semantic structure of the content. This ensures the embedding model focuses on meaning rather than formatting artifacts.

import { OllamaEmbeddings } from '@langchain/ollama'
import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters'
import { type CollectionEntry } from 'astro:content'

/**
 * Initialize Ollama embeddings with BGE-large model
 */
const initializeEmbeddings = () => {
  return new OllamaEmbeddings({
    model: 'bge-large', // BGE-large model for text embeddings
    baseUrl: 'http://localhost:11434', // Default Ollama URL
  })
}

/**
 * Initialize text splitter for chunking content
 */
const initializeTextSplitter = () => {
  return new RecursiveCharacterTextSplitter({
    chunkSize: 400, // Smaller than 512 to leave room for special tokens
    chunkOverlap: 50, // Small overlap to maintain context
    separators: ['\n\n', '\n', '. ', '! ', '? ', ' ', ''],
  })
}

/**
 * Extract and prepare content from a post for embedding
 */
export const extractPostContentForEmbedding = async (
  post: CollectionEntry<'posts'>,
): Promise<string> => {
  try {
    // Combine frontmatter and content with better structure
    const frontmatter = `${post.data.title}. ${post.data.description || ''}. Tags: ${post.data.tags?.join(', ') || 'none'}.`

    // Get the body content
    const body = post.body || ''

    // Remove markdown syntax for cleaner text
    const cleanBody = body
      .replace(/^#{1,6}\s+/gm, '') // Remove headers
      .replace(/\*\*(.*?)\*\*/g, '$1') // Remove bold
      .replace(/\*(.*?)\*/g, '$1') // Remove italic
      .replace(/`([^`]+)`/g, '$1') // Remove inline code
      .replace(/```[\s\S]*?```/g, '') // Remove code blocks
      .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1') // Replace links with text
      .replace(/!\[([^\]]*)\]\([^)]+\)/g, '$1') // Replace images with alt text
      .replace(/^\s*[-*+]\s+/gm, '') // Remove list markers
      .replace(/^\s*\d+\.\s+/gm, '') // Remove numbered list markers
      .replace(/\n\s*\n/g, '\n') // Remove extra newlines
      .trim()

    return `${frontmatter}\n\n${cleanBody}`
  } catch (error) {
    console.warn(
      `Failed to extract content for post ${post.id}:`,
      error instanceof Error ? error.message : error,
    )
    // Fallback to just frontmatter
    return `${post.data.title}. ${post.data.description || ''}. Tags: ${post.data.tags?.join(', ') || 'none'}.`
  }
}

Chunking Strategy

Long blog posts often exceed the token limits of embedding models, necessitating a chunking approach. Our strategy balances context preservation with computational efficiency. By using 400-character chunks with 50-character overlaps, we ensure that important concepts spanning chunk boundaries aren’t lost.

The recursive character splitter prioritizes natural language boundaries (paragraphs, sentences, phrases) over arbitrary character limits. This linguistic awareness helps maintain semantic coherence within each chunk, leading to more meaningful embeddings.

Key chunking parameters:

Chunk Size: 400 characters (leaves room for model tokens)
Overlap: 50 characters (maintains context between chunks)
Separators: Prioritize natural language boundaries

Embedding Generation Process

The embedding generation process converts our preprocessed and chunked content into numerical vectors. Each chunk produces a separate embedding, allowing us to capture different aspects of a post’s content. This multi-chunk approach is particularly effective for longer posts that cover multiple topics or have distinct sections.

Error handling is crucial here since network issues or model unavailability can disrupt the embedding process. Our implementation gracefully degrades by returning empty arrays when embedding generation fails, allowing the system to continue operating with cached embeddings.

/**
 * Generate embeddings for a post by chunking and embedding each chunk
 */
export const generatePostEmbeddings = async (
  post: CollectionEntry<'posts'>,
): Promise<number[][]> => {
  try {
    console.log(`📄 Generating embeddings for: "${post.data.title}"`)

    const embeddings = initializeEmbeddings()
    const textSplitter = initializeTextSplitter()

    // Extract and clean content
    const content = await extractPostContentForEmbedding(post)

    // Split content into chunks
    const chunks = await textSplitter.splitText(content)
    console.log(`   → Split into ${chunks.length} chunks`)

    if (chunks.length === 0) {
      console.warn(`   → No chunks generated for post ${post.id}`)
      return []
    }

    // Generate embeddings for each chunk
    const chunkEmbeddings = await embeddings.embedDocuments(chunks)
    console.log(
      `   → Generated embeddings: ${chunkEmbeddings.length} x ${chunkEmbeddings[0]?.length || 0} dimensions`,
    )

    return chunkEmbeddings
  } catch (error) {
    console.error(
      `Error generating embeddings for post ${post.id}:`,
      error instanceof Error ? error.message : error,
    )
    return []
  }
}

Similarity Algorithms

Effective related post recommendation requires combining multiple similarity signals. Our hybrid approach uses three complementary algorithms that capture different aspects of content relationships: semantic similarity through embeddings, explicit categorization through tags, and temporal relevance. This multi-dimensional approach provides more robust and nuanced recommendations than any single metric alone.

1. Cosine Similarity for Embeddings

Cosine similarity measures the angle between two vectors in high-dimensional space, making it ideal for comparing embeddings. Unlike Euclidean distance, cosine similarity is normalized and focuses on the direction of vectors rather than their magnitude, which is perfect for semantic comparisons where the relative positioning of concepts matters more than absolute values.

For posts with multiple chunks, we implement a sophisticated comparison strategy that evaluates all chunk pairs and combines maximum similarity (for the strongest connections) with average similarity (for overall content alignment). This approach ensures we capture both specific topical overlaps and general thematic relationships.

/**
 * Calculate cosine similarity between two vectors
 */
export const cosineSimilarity = (vecA: number[], vecB: number[]): number => {
  if (vecA.length !== vecB.length) {
    throw new Error('Vectors must have the same length')
  }

  let dotProduct = 0
  let normA = 0
  let normB = 0

  for (let i = 0; i < vecA.length; i++) {
    dotProduct += vecA[i] * vecB[i]
    normA += vecA[i] * vecA[i]
    normB += vecB[i] * vecB[i]
  }

  normA = Math.sqrt(normA)
  normB = Math.sqrt(normB)

  if (normA === 0 || normB === 0) {
    return 0
  }

  return dotProduct / (normA * normB)
}

/**
 * Calculate similarity between two posts using their embeddings
 * Uses maximum cosine similarity between all chunk pairs
 */
export const calculateEmbeddingSimilarity = (
  embeddings1: number[][],
  embeddings2: number[][],
): number => {
  if (embeddings1.length === 0 || embeddings2.length === 0) {
    return 0
  }

  let maxSimilarity = -1
  let totalSimilarity = 0
  let comparisons = 0

  // Compare all chunk pairs and find the maximum similarity
  for (const embedding1 of embeddings1) {
    for (const embedding2 of embeddings2) {
      const similarity = cosineSimilarity(embedding1, embedding2)
      maxSimilarity = Math.max(maxSimilarity, similarity)
      totalSimilarity += similarity
      comparisons++
    }
  }

  // Return a weighted combination of max and average similarity
  const avgSimilarity = totalSimilarity / comparisons
  const embeddingSimilarity = 0.7 * maxSimilarity + 0.3 * avgSimilarity

  return Math.max(0, Math.min(1, embeddingSimilarity))
}

2. Tag Similarity (Jaccard Index)

Tags represent explicit categorization chosen by the author, providing valuable semantic signals that complement embedding-based similarity. The Jaccard index measures set similarity by calculating the ratio of intersection to union, providing a normalized score between 0 and 1.

This algorithm captures deliberate topical relationships that authors have explicitly marked. While embeddings might miss some explicit categorizations (especially for new terminology or very specific technical topics), tag similarity ensures these intentional relationships are preserved in our recommendations.

/**
 * Calculate tag similarity using Jaccard index
 */
const calculateTagSimilarity = (
  post1: CollectionEntry<'posts'>,
  post2: CollectionEntry<'posts'>,
): number => {
  const tags1 = new Set(post1.data.tags?.map((tag) => tag.toLowerCase()) || [])
  const tags2 = new Set(post2.data.tags?.map((tag) => tag.toLowerCase()) || [])

  if (tags1.size === 0 && tags2.size === 0) {
    return 0 // No tags to compare
  }

  const intersection = new Set([...tags1].filter((tag) => tags2.has(tag)))
  const union = new Set([...tags1, ...tags2])

  return intersection.size / union.size
}

3. Temporal Similarity

Temporal proximity often indicates related content, especially in technical blogs where authors explore related topics in sequence or return to subjects with updated perspectives. Our exponential decay function provides higher scores for posts published closer together while gradually reducing relevance for older posts.

The 30-day half-life parameter balances recency with relevance—recent posts get preference, but the effect diminishes smoothly rather than creating hard cutoffs. This approach prevents the system from being dominated by publication dates while still capturing meaningful temporal relationships in content evolution.

/**
 * Calculate time-based similarity (closer in time = more similar)
 */
const calculateTimeScore = (
  post1: CollectionEntry<'posts'>,
  post2: CollectionEntry<'posts'>,
): number => {
  const date1 = new Date(post1.data.date).getTime()
  const date2 = new Date(post2.data.date).getTime()
  const timeDiff = Math.abs(date1 - date2)

  // Convert to days
  const daysDiff = timeDiff / (1000 * 60 * 60 * 24)

  // Exponential decay: posts within a week get high scores
  return Math.exp(-daysDiff / 30) // 30-day half-life
}

4. Weighted Score Combination

The final similarity score combines all three algorithms using carefully tuned weights that reflect their relative importance. Our weighting scheme prioritizes semantic similarity (70%) as the primary signal, uses tag similarity (20%) for explicit categorization, and includes temporal similarity (10%) for recency bias.

These weights are configurable and should be adjusted based on your content strategy. Technical blogs might weight tag similarity higher, while narrative blogs might emphasize temporal relationships. The async nature of this function allows for efficient parallel processing of multiple post comparisons.

/**
 * Calculate overall similarity score between two posts
 */
const calculateSimilarityScores = async (
  post1: CollectionEntry<'posts'>,
  post2: CollectionEntry<'posts'>,
  cache: EmbeddingCache,
  ollamaWorking: boolean = true,
): Promise<RelatedPost> => {
  const tagScore = calculateTagSimilarity(post1, post2)
  const timeScore = calculateTimeScore(post1, post2)

  // Get embeddings for both posts
  const embeddings1 = await getCachedEmbeddingsForPost(post1, cache, ollamaWorking)
  const embeddings2 = await getCachedEmbeddingsForPost(post2, cache, ollamaWorking)

  // Calculate embedding similarity
  const embeddingScore = calculateEmbeddingSimilarity(embeddings1, embeddings2)

  // Weight the scores (adjust these weights to change the importance of each factor)
  const weights = {
    tag: 0.2, // Tag similarity for explicit categorization
    time: 0.1, // Minimal time importance
    embedding: 0.7, // High embedding importance for semantic similarity
  }

  const totalScore =
    tagScore * weights.tag + timeScore * weights.time + embeddingScore * weights.embedding

  return {
    post: post2,
    tagScore,
    timeScore,
    embeddingScore,
    totalScore,
  }
}

Caching System

Embeddings are computationally expensive to generate, making caching essential for practical deployment. Our caching system addresses three critical challenges: storage efficiency, cache invalidation, and deployment portability. The compressed approach reduces storage requirements by 77% while maintaining fast access times and ensuring embeddings remain available even when Ollama isn’t running.

Compressed Cache Implementation

Large embedding files can quickly become unwieldy in git repositories and deployment pipelines. Our compression strategy uses gzip to dramatically reduce file sizes while maintaining quick decompression during runtime. The system automatically handles migration from uncompressed formats and provides transparent operation regardless of the underlying storage format.

The implementation includes robust error handling for compression/decompression failures and maintains compatibility with both development and production environments. Version checking ensures cache compatibility across different system iterations.

import fs from 'fs-extra'
import path from 'path'
import crypto from 'crypto'
import { promisify } from 'util'
import { gzip, gunzip } from 'zlib'

const CACHE_FILE_PATH = path.join(process.cwd(), '.embedding-cache.json.gz')
const CACHE_VERSION = '1.0.0'

const gzipAsync = promisify(gzip)
const gunzipAsync = promisify(gunzip)

/**
 * Generate a hash for post content to detect changes
 */
export const generateContentHash = (content: string): string => {
  return crypto.createHash('sha256').update(content).digest('hex').substring(0, 16)
}

/**
 * Load embedding cache from disk (compressed)
 */
export const loadEmbeddingCache = async (): Promise<EmbeddingCache> => {
  try {
    // First check for compressed file
    if (await fs.pathExists(CACHE_FILE_PATH)) {
      const compressedData = await fs.readFile(CACHE_FILE_PATH)
      const decompressedData = await gunzipAsync(compressedData)
      const cache = JSON.parse(decompressedData.toString())

      // Validate cache version
      if (cache.version !== CACHE_VERSION) {
        console.log('Embedding cache version mismatch, starting fresh cache')
        return { version: CACHE_VERSION, embeddings: {} }
      }
      return cache
    }

    // Fallback: check for old uncompressed file and migrate it
    const oldCacheFile = path.join(process.cwd(), '.embedding-cache.json')
    if (await fs.pathExists(oldCacheFile)) {
      console.log('Migrating uncompressed cache to compressed format...')
      const cache = await fs.readJson(oldCacheFile)

      // Save as compressed and remove old file
      await saveEmbeddingCache(cache)
      await fs.remove(oldCacheFile)

      return cache
    }
  } catch (error) {
    console.warn('Failed to load embedding cache:', error)
  }

  return { version: CACHE_VERSION, embeddings: {} }
}

/**
 * Save embedding cache to disk (compressed)
 */
export const saveEmbeddingCache = async (cache: EmbeddingCache): Promise<void> => {
  try {
    const jsonString = JSON.stringify(cache, null, 2)
    const compressedData = await gzipAsync(Buffer.from(jsonString))
    await fs.writeFile(CACHE_FILE_PATH, compressedData)
    console.log(
      `💾 Saved compressed embedding cache (${(compressedData.length / 1024 / 1024).toFixed(2)} MB)`,
    )
  } catch (error) {
    console.error('Failed to save embedding cache:', error)
  }
}

Cache Validation and Management

Effective cache management requires balancing performance with data freshness. Our system uses content hashing to detect when posts have been modified, automatically invalidating stale cache entries. The SHA-256 hash provides reliable change detection while the truncated 16-character version maintains efficiency.

Automatic cleanup prevents cache bloat by removing entries older than 30 days. This time-based expiration ensures that the cache doesn’t grow indefinitely while maintaining performance for actively updated content. The cleanup process runs automatically during normal cache operations, requiring no manual intervention.

/**
 * Get cached embeddings if available and valid
 */
export const getCachedEmbeddings = (
  cache: EmbeddingCache,
  postId: string,
  contentHash: string,
): number[][] | null => {
  const cached = cache.embeddings[postId]

  if (!cached) {
    return null
  }

  // Check if content has changed by comparing hashes
  if (cached.contentHash !== contentHash) {
    // Content has changed, cache is invalid
    delete cache.embeddings[postId]
    return null
  }

  return cached.embeddings
}

/**
 * Store embeddings in cache
 */
export const setCachedEmbeddings = (
  cache: EmbeddingCache,
  postId: string,
  embeddings: number[][],
  contentHash: string,
): void => {
  cache.embeddings[postId] = {
    postId,
    embeddings,
    timestamp: Date.now(),
    contentHash,
  }
}

/**
 * Clean up old cache entries (older than 30 days)
 */
export const cleanupOldEmbeddingEntries = (cache: EmbeddingCache): void => {
  const thirtyDaysAgo = Date.now() - 30 * 24 * 60 * 60 * 1000

  for (const [postId, entry] of Object.entries(cache.embeddings)) {
    if (entry.timestamp < thirtyDaysAgo) {
      delete cache.embeddings[postId]
    }
  }
}

Integration with Astro

Astro’s static site generation model requires careful consideration of when and how embeddings are generated. Our integration strategy ensures optimal build performance while maintaining flexibility for both development and production environments. The system intelligently handles Ollama availability and gracefully degrades when necessary.

The core function orchestrates the entire similarity calculation process, managing cache operations, Ollama connectivity, and scoring algorithms. The intelligent fallback system ensures robust operation across different deployment scenarios—using cached embeddings when available and falling back to traditional similarity when necessary.

The function implements several optimization strategies: parallel processing of similarity calculations, intelligent cache management, and comprehensive logging for debugging and monitoring. The async/await pattern ensures efficient handling of I/O operations while maintaining readable code structure.

// relatedPosts.mts
export const getRelatedPosts = async (
  currentPost: CollectionEntry<'posts'>,
  allPosts: CollectionEntry<'posts'>[],
  limit = 2,
): Promise<RelatedPost[]> => {
  // Load embedding cache first
  const cache = await loadEmbeddingCache()

  // Check if we have a reasonable amount of cached embeddings
  const stats = getCacheStats(cache)
  const hasSufficientCache = stats.totalPosts > 0

  // Test Ollama connection only if we need to generate new embeddings
  let ollamaWorking = false
  if (!hasSufficientCache) {
    console.log('📊 No cached embeddings found, testing Ollama connection...')
    ollamaWorking = await testOllamaConnection()
    if (!ollamaWorking) {
      console.log(
        '⚠️  Ollama not available and no cache, falling back to traditional similarity only',
      )
      return getFallbackRelatedPosts(currentPost, allPosts, limit)
    }
  } else {
    console.log(`📊 Using cached embeddings (${stats.totalPosts} posts cached)`)
    // Try Ollama for new embeddings but don't fail if it's not available
    ollamaWorking = await testOllamaConnection()
    if (!ollamaWorking) {
      console.log('💾 Ollama not available but using cached embeddings')
    }
  }

  // Clean up old cache entries
  cleanupOldEmbeddingEntries(cache)

  // Log cache stats
  console.log(
    `💾 Embedding cache: ${stats.totalPosts} posts, ${stats.totalEmbeddings} embeddings (${stats.cacheSize})`,
  )

  // Don't include the current post
  const otherPosts = allPosts.filter((post) => post.id !== currentPost.id)

  console.log(
    `📝 Processing ${otherPosts.length} posts for similarity with "${currentPost.data.title}"`,
  )

  // Calculate similarity scores for all posts
  const scoredPosts = await Promise.all(
    otherPosts.map(async (post) => {
      const scores = await calculateSimilarityScores(
        currentPost,
        post,
        cache,
        ollamaWorking,
      )
      return scores
    }),
  )

  // Save updated cache
  await saveEmbeddingCache(cache)

  // Debug: Log top scores
  console.log(`\n🔍 Top similarity scores for "${currentPost.data.title}":`)
  scoredPosts
    .sort((a, b) => b.totalScore - a.totalScore)
    .slice(0, 5) // Show top 5
    .forEach((result, index) => {
      console.log(`${index + 1}. "${result.post.data.title}":`)
      console.log(
        `   Tag: ${result.tagScore.toFixed(3)}, Time: ${result.timeScore.toFixed(3)}, Embedding: ${result.embeddingScore.toFixed(3)} → Total: ${result.totalScore.toFixed(3)}`,
      )
    })

  // Filter and return results
  const MIN_SCORE = 0.1 // Reasonable threshold for embedding similarity
  const results = scoredPosts
    .filter(({ totalScore }) => totalScore >= MIN_SCORE)
    .sort((a, b) => b.totalScore - a.totalScore)
    .slice(0, limit)

  console.log(`\n🎯 Found ${results.length} related posts (min score: ${MIN_SCORE})\n`)

  return results
}

Astro Page Integration

Astro’s component-based architecture makes it straightforward to integrate related posts into your blog template. The key consideration is calling the related posts function during the static generation phase, ensuring all similarity calculations happen at build time rather than runtime.

The integration pattern shown below demonstrates how to seamlessly incorporate related posts into your post template while maintaining Astro’s performance characteristics. The conditional rendering ensures the related posts section only appears when relevant matches are found.

---
import { getCollection } from 'astro:content'
import { getRelatedPosts } from '../../relatedPosts.mts'

export async function getStaticPaths() {
  const posts = await getCollection('posts')
  return posts.map((post) => ({
    params: { slug: post.slug },
    props: post,
  }))
}

const post = Astro.props
const posts = await getCollection('posts')
const relatedPosts = await getRelatedPosts(post, posts, 3)
---

<html>
  <head>
    <title>{post.data.title}</title>
  </head>
  <body>
    <article>
      <!-- Post content -->
      <h1>{post.data.title}</h1>
      <Content />
    </article>

    {
      relatedPosts.length > 0 && (
        <section class="related-posts">
          <h2>Related Posts</h2>
          <div class="posts-grid">
            {relatedPosts.map(({ post: relatedPost, totalScore }) => (
              <article class="related-post">
                <h3>
                  <a href={`/posts/${relatedPost.slug}`}>{relatedPost.data.title}</a>
                </h3>
                <p>{relatedPost.data.description}</p>
                <div class="similarity-score">
                  Similarity: {(totalScore * 100).toFixed(1)}%
                </div>
              </article>
            ))}
          </div>
        </section>
      )
    }
  </body>
</html>

Deployment Strategy

Modern deployment environments often have constraints that differ significantly from development setups. Our system addresses these challenges through intelligent environment detection and graceful degradation strategies. The key insight is separating embedding generation (development-time) from embedding usage (runtime), allowing the system to work reliably even when AI infrastructure isn’t available in production.

Development vs. Production

The two-environment strategy maximizes both development flexibility and production reliability. In development, the full AI pipeline runs locally, generating and caching embeddings as content changes. In production, the system relies on pre-computed embeddings, ensuring consistent performance without external dependencies.

This approach provides several advantages: predictable production performance, reduced deployment complexity, and lower operational costs. The system automatically detects its environment and adapts accordingly:

Development (with Ollama):

Generates embeddings for new/modified posts
Updates cache automatically
Full semantic similarity features

Production (without Ollama):

Uses pre-computed compressed cache
Falls back gracefully for missing embeddings
Maintains high performance

Compression Benefits

The gzip compression strategy addresses one of the main concerns with embedding-based systems: storage overhead. Large embedding files can quickly bloat repositories and slow down deployment pipelines. Our compression approach provides substantial benefits without compromising functionality:

The compression benefits extend beyond just file size:

77% size reduction (36 MB → 8.1 MB)
Faster git operations and deployment
Automatic migration from uncompressed format
Transparent decompression during runtime

Performance Optimization

Performance optimization in AI-powered systems requires balancing accuracy with speed. Our approach focuses on minimizing computational overhead while maintaining the quality of similarity calculations. The key strategies involve intelligent caching, efficient algorithms, and smart resource management.

Optimization Strategies

The optimization approach targets the three most expensive operations: embedding generation, similarity calculations, and cache management. Each strategy addresses specific performance bottlenecks while maintaining system reliability.

Intelligent Caching
- Content hash validation prevents unnecessary regeneration
- Compressed storage reduces memory usage
- Automatic cleanup of old entries
Chunking Strategy
- Optimal chunk size for BGE-Large model
- Overlap preservation maintains context
- Parallel processing where possible
Similarity Calculation
- Weighted scoring system balances different signals
- Early termination for low-scoring candidates
- Batch processing for multiple comparisons

Monitoring and Debugging

Comprehensive logging is essential for understanding system behavior and diagnosing issues in production. Our logging strategy provides detailed insights into cache performance, similarity calculations, and embedding generation without overwhelming developers with unnecessary detail.

The logging output follows a structured format that makes it easy to track the system’s decision-making process and identify potential performance bottlenecks. The emoji-based prefixes provide quick visual scanning of different operation types.

// Example output during development
📊 Using cached embeddings (47 posts cached)
💾 Embedding cache: 47 posts, 156 embeddings (8.2 MB)
📝 Processing 46 posts for similarity with "Building Semantic Related Posts"

🔍 Top similarity scores for "Building Semantic Related Posts":
1. "Machine Learning in JavaScript":
   Tag: 0.167, Time: 0.045, Embedding: 0.823 → Total: 0.614
2. "AI-Powered Content Analysis":
   Tag: 0.100, Time: 0.023, Embedding: 0.756 → Total: 0.552
3. "Neural Networks Fundamentals":
   Tag: 0.000, Time: 0.012, Embedding: 0.687 → Total: 0.481

🎯 Found 3 related posts (min score: 0.1)

Cache Statistics

Real-time cache statistics provide insights into system performance and help optimize configuration parameters. The statistics function calculates key metrics including storage efficiency, embedding density, and memory usage patterns.

These metrics are particularly valuable for tuning cache cleanup intervals and understanding the relationship between content volume and storage requirements. The size calculation provides accurate byte-level measurements for capacity planning.

export const getCacheStats = (
  cache: EmbeddingCache,
): {
  totalPosts: number
  totalEmbeddings: number
  cacheSize: string
} => {
  const totalPosts = Object.keys(cache.embeddings).length
  let totalEmbeddings = 0

  for (const entry of Object.values(cache.embeddings)) {
    totalEmbeddings += entry.embeddings.length
  }

  // Estimate cache size
  const cacheString = JSON.stringify(cache)
  const cacheSizeBytes = Buffer.byteLength(cacheString, 'utf8')
  const cacheSizeMB = (cacheSizeBytes / (1024 * 1024)).toFixed(2)

  return {
    totalPosts,
    totalEmbeddings,
    cacheSize: `${cacheSizeMB} MB`,
  }
}

Conclusion

This comprehensive implementation provides several key advantages:

🚀 Intelligent Semantic Understanding

Goes beyond simple tag matching to understand content meaning
Discovers subtle relationships between posts
Adapts to your unique writing style and topics

⚡ Production-Ready Performance

Compressed caching reduces storage by 77%
Graceful fallbacks ensure reliability
Works with or without Ollama in production

🛠 Developer-Friendly

Comprehensive logging and debugging
Modular architecture for easy customization
TypeScript support throughout

🎯 Customizable Scoring

Adjustable weights for different similarity signals
Configurable thresholds and limits
Easy to extend with additional algorithms

The result is a sophisticated related posts system that truly understands your content and provides meaningful recommendations to your readers. The combination of semantic embeddings, traditional signals, and intelligent caching creates a robust solution that works reliably across different deployment scenarios.

Next Steps

Consider these enhancements for further development:

A/B Testing: Compare semantic vs. traditional related posts performance
User Feedback: Implement click tracking to improve scoring weights
Multi-language Support: Extend for international content
Real-time Updates: WebSocket integration for live similarity updates
Visual Embeddings: Extend to include image similarity for posts with graphics

The foundation we’ve built provides a solid base for these advanced features while maintaining excellent performance and reliability.