randscript

Docling RAG Agent (Ollama Edition)

An intelligent agent that provides conversational access to a knowledge base stored in PostgreSQL with PGVector. Uses RAG (Retrieval Augmented Generation) to search through embedded documents and provide contextual, accurate responses with source citations. Supports multiple document formats including audio files with Whisper transcription.

Inspired by the Docling RAG Agent from coleam00/ottomator-agents, adapted to run with local LLMs via Ollama instead of requiring OpenAI API keys.

🌟 New: Web Interface Available!

A modern web interface is now available with:

# Start the web interface
uv run python web_app.py

Then open http://localhost:8000 in your browser.

See WEB_INTERFACE.md for full documentation.

πŸŽ“ New to Docling?

Start with the tutorials! Check out the docling_basics/ folder for progressive examples that teach Docling fundamentals:

  1. Simple PDF Conversion - Basic document processing
  2. Multiple Format Support - PDF, Word, PowerPoint handling
  3. Audio Transcription - Speech-to-text with Whisper
  4. Hybrid Chunking - Intelligent chunking for RAG systems

These tutorials provide the foundation for understanding how this full RAG agent works. β†’ Go to Docling Basics

Features

Interface Options

Choose how you want to interact:

Interface Best For Command
🌐 Web Interface Visual UI, file uploads, web crawling uv run python web_app.py
πŸ’» CLI Terminal workflows, SSH access uv run python cli.py

Core Features

Supported Document Formats

Format Extensions Processing
πŸ“„ PDF .pdf Docling conversion
πŸ“ Word .docx, .doc Docling conversion
πŸ“Š PowerPoint .pptx, .ppt Docling conversion
πŸ“ˆ Excel .xlsx, .xls Docling conversion
🌐 HTML .html, .htm Docling conversion
πŸ“‹ Markdown .md, .markdown Direct processing
πŸ“ƒ Text .txt Direct processing
🎡 Audio .mp3, .wav, .m4a Whisper transcription

Prerequisites

System Dependencies

macOS:

# Install required libraries for audio/video processing
brew install opus opusfile

Linux (Ubuntu/Debian):

sudo apt-get install libopus0 libopusfile0

Quick Start

1. Install Dependencies

# Install dependencies using UV
uv sync

2. Set Up Environment Variables

Copy .env.example to .env and configure your provider:

cp .env.example .env

Required variables:

Choose your LLM provider:

Option 1: Ollama (Local - Recommended)

OPENAI_API_KEY=ollama
OPENAI_BASE_URL=http://localhost:11434/v1
LLM_CHOICE=mistral              # or llama3.2, qwen2.5, etc.
EMBEDDING_MODEL=nomic-embed-text

Available Ollama models:

Option 2: OpenAI (Cloud)

OPENAI_API_KEY=sk-your-key-here
LLM_CHOICE=gpt-4o-mini
EMBEDDING_MODEL=text-embedding-3-small

3. Configure Database

You must set up your PostgreSQL database with the PGVector extension and create the required schema:

  1. Enable PGVector extension in your database (most cloud providers have this pre-installed)
    CREATE EXTENSION IF NOT EXISTS vector;
    
  2. Run the schema file to create tables and functions:
    # In the SQL editor in Supabase/Neon, run:
    sql/schema.sql
    
    # Or using psql
    psql $DATABASE_URL < sql/schema.sql
    

The schema file (sql/schema.sql) creates:

4. Choose Your Interface

# Start the web server
uv run python web_app.py

Then open http://localhost:8000 in your browser.

Web Interface Features:

Option B: CLI Interface

# Run the CLI agent
uv run python cli.py

CLI Commands:

5. Ingest Documents

Add your documents to the documents/ folder, then ingest:

# Ingest all documents in the documents/ folder
# NOTE: By default, this CLEARS existing data before ingestion
uv run python -m ingestion.ingest --documents documents/

# Adjust chunk size (default: 1000)
uv run python -m ingestion.ingest --documents documents/ --chunk-size 800

# Append without cleaning (keep existing data)
uv run python -m ingestion.ingest --documents documents/ --no-clean

⚠️ Important: The ingestion process automatically deletes all existing documents and chunks from the database before adding new documents (unless --no-clean is used). This ensures a clean state and prevents duplicate data.

The ingestion pipeline will:

  1. Auto-detect file type and use Docling for PDFs, Office docs, HTML, and audio
  2. Transcribe audio files using Whisper Turbo ASR with timestamps
  3. Convert to Markdown for consistent processing
  4. Split into semantic chunks with configurable size
  5. Generate embeddings using Ollama or OpenAI
  6. Store in PostgreSQL with PGVector for similarity search

Architecture

System Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      USER INTERFACES                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚   Web Interface     β”‚           β”‚     CLI Interface   β”‚      β”‚
β”‚  β”‚   (FastAPI + HTML)  β”‚           β”‚   (Python async)    β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚                                 β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚       RAG Agent Core            β”‚
              β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
              β”‚  β”‚ PydanticAI Agent          β”‚  β”‚
              β”‚  β”‚ + search_knowledge_base() β”‚  β”‚
              β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                  β”‚                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
β”‚  Embeddings     β”‚  β”‚    LLM       β”‚  β”‚  PostgreSQL  β”‚
β”‚  (Ollama/       β”‚  β”‚  (Ollama/    β”‚  β”‚  + PGVector  β”‚
β”‚   OpenAI)       β”‚  β”‚   OpenAI)    β”‚  β”‚              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Data Sources    │────▢│   Ingestion     │────▢│  Knowledge     β”‚
β”‚  β€’ Local files   β”‚     β”‚   Pipeline      β”‚     β”‚  Base (PGVec)  β”‚
β”‚  β€’ Web crawl     β”‚     β”‚  (Docling)      β”‚     β”‚                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User Query     │◀────│   RAG Agent     │◀────│  Semantic      β”‚
β”‚  (Web or CLI)    β”‚     β”‚  + Streaming    β”‚     β”‚  Search        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Audio Transcription Feature

Audio files are automatically transcribed using OpenAI Whisper Turbo model:

How it works:

  1. When ingesting audio files (MP3 supported currently), Docling uses Whisper ASR
  2. Whisper generates accurate transcriptions with timestamps
  3. Transcripts are formatted as markdown with time markers
  4. Audio content becomes fully searchable through the RAG system

Benefits:

Model details:

Example transcript format:

[time: 0.0-4.0] Welcome to our podcast on AI and machine learning.
[time: 5.28-9.96] Today we'll discuss retrieval augmented generation systems.

Key Components

RAG Agent

The main agent (rag_agent.py) that:

search_knowledge_base Tool

Function tool registered with the agent that:

Example tool definition:

async def search_knowledge_base(
    ctx: RunContext[None],
    query: str,
    limit: int = 5
) -> str:
    """Search the knowledge base using semantic similarity."""
    # Generate embedding for query
    # Search PostgreSQL with PGVector
    # Format and return results

Database Schema

Performance Optimization

Database Connection Pooling

db_pool = await asyncpg.create_pool(
    DATABASE_URL,
    min_size=2,
    max_size=10,
    command_timeout=60
)

Embedding Cache

The embedder includes built-in caching for frequently searched queries, reducing API calls and latency.

Streaming Responses

Token-by-token streaming provides immediate feedback to users while the LLM generates responses:

async with agent.run_stream(user_input, message_history=history) as result:
    async for text in result.stream_text(delta=False):
        print(f"\rAssistant: {text}", end="", flush=True)

Docker Deployment

Using Docker Compose

# Start all services
docker-compose up -d

# Ingest documents
docker-compose --profile ingestion up ingestion

# View logs
docker-compose logs -f rag-agent

API Reference

search_knowledge_base Tool

async def search_knowledge_base(
    ctx: RunContext[None],
    query: str,
    limit: int = 5
) -> str:
    """
    Search the knowledge base using semantic similarity.

    Args:
        query: The search query to find relevant information
        limit: Maximum number of results to return (default: 5)

    Returns:
        Formatted search results with source citations
    """

Database Functions

-- Vector similarity search
SELECT * FROM match_chunks(
    query_embedding::vector(1536),
    match_count INT,
    similarity_threshold FLOAT DEFAULT 0.7
)

Returns chunks with:

Project Structure

docling-rag-agent/
β”œβ”€β”€ cli.py                   # Enhanced CLI with colors and features
β”œβ”€β”€ rag_agent.py             # Basic CLI agent with PydanticAI
β”œβ”€β”€ web_app.py               # FastAPI web interface server ⭐ NEW
β”‚
β”œβ”€β”€ web/                     # Web interface frontend ⭐ NEW
β”‚   └── index.html           # Single-page application (HTML/CSS/JS)
β”‚
β”œβ”€β”€ ingestion/
β”‚   β”œβ”€β”€ ingest.py            # Document ingestion pipeline
β”‚   β”œβ”€β”€ embedder.py          # Embedding generation with caching
β”‚   └── chunker.py           # Document chunking logic
β”‚
β”œβ”€β”€ web_crawler/             # Web scraping utilities ⭐ NEW
β”‚   β”œβ”€β”€ 1-crawl_single_page.py
β”‚   β”œβ”€β”€ 2-crawl_docs_sequential.py
β”‚   β”œβ”€β”€ 3-crawl_sitemap_in_parallel.py
β”‚   β”œβ”€β”€ 4-crawl_llms_txt.py
β”‚   β”œβ”€β”€ 5-crawl_site_recursively.py
β”‚   └── _crawl_utils.py      # Shared utilities for web app
β”‚
β”œβ”€β”€ docling_basics/          # Docling tutorials
β”‚   β”œβ”€β”€ 01_simple_pdf.py
β”‚   β”œβ”€β”€ 02_multiple_formats.py
β”‚   β”œβ”€β”€ 03_audio_transcription.py
β”‚   └── 04_hybrid_chunking.py
β”‚
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ providers.py         # OpenAI/Ollama model/client configuration
β”‚   β”œβ”€β”€ db_utils.py          # Database connection pooling
β”‚   └── models.py            # Pydantic models for config
β”‚
β”œβ”€β”€ sql/
β”‚   β”œβ”€β”€ schema.sql           # PostgreSQL schema with PGVector
β”‚   β”œβ”€β”€ backup.sh            # Database backup script
β”‚   └── restore.sh           # Database restore script
β”‚
β”œβ”€β”€ documents/               # Sample documents for ingestion
β”œβ”€β”€ pyproject.toml           # Project dependencies
β”œβ”€β”€ .env.example             # Environment variables template
β”‚
β”œβ”€β”€ README.md                # This file
β”œβ”€β”€ WEB_INTERFACE.md         # Web interface documentation ⭐ NEW
└── DATA_PIPELINE.md         # Data collection guide ⭐ NEW

Documentation

Document Description
README.md Main project documentation
WEB_INTERFACE.md Web interface usage guide
DATA_PIPELINE.md Data collection pipeline guide
docling_basics/README.md Docling tutorials

Troubleshooting

Python Version Error: unsupported operand type(s) for |

Error:

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

Cause: You’re using Python 3.9, but crawl4ai requires Python 3.10+.

Solution: Upgrade to Python 3.10 or later (Python 3.11+ recommended):

# Check your Python version
python --version

# If using Python 3.9, recreate the virtual environment with Python 3.10+
uv venv --python 3.11 --clear
uv sync

Port Already in Use

Error:

ERROR: [Errno 48] Address already in use

Solution:

# Kill the process using port 8000
lsof -ti:8000 | xargs kill -9

# Or use a different port
uv run python web_app.py --port 8001

Missing System Libraries

Error:

fatal error: 'opus/opus.h' file not found

Solution: Install required system dependencies:

# macOS
brew install opus opusfile

# Linux (Ubuntu/Debian)
sudo apt-get install libopus0 libopusfile0

Database Connection Failed

Error:

Database not initialized. Please check your DATABASE_URL configuration.

Solution:

  1. Verify PostgreSQL is running: pg_isready
  2. Check DATABASE_URL in your .env file
  3. Ensure the database exists and PGVector extension is installed:
    CREATE EXTENSION IF NOT EXISTS vector;
    

Acknowledgments

This project is inspired by the Docling RAG Agent from the excellent ottomator-agents collection by coleam00.

Modifications made: