Learnixo
All Projects
AI & MLintermediate View on GitHub

DocQuery

RAG Document Q&A in Python — FastAPI, LangChain, pgvector, GPT-4o

1–2 hours to set up10 technologies5 guided steps

About This Project

A production RAG (Retrieval-Augmented Generation) system in Python: upload PDFs and Word documents, chunk and embed them with OpenAI embeddings, store vectors in PostgreSQL with pgvector, and chat with your documents via a streaming FastAPI endpoint. Every answer cites source chunks with page numbers.

What You'll Learn

Build a full RAG pipeline from document upload to streaming answer
Store and query vector embeddings with pgvector in PostgreSQL
Implement streaming SSE responses with FastAPI
Add source citations to every LLM answer
Implement hybrid search with vector + keyword re-ranking

Key Features

Upload PDF and Word documents via REST — chunked and embedded on upload
pgvector cosine similarity search over document chunks
GPT-4o streaming responses via Server-Sent Events — no wait for full answer
Source citations: every answer includes chunk text + page number
Hybrid search: vector similarity + keyword BM25 re-ranking
Conversation history: multi-turn Q&A with memory per session
Rate limiting per API key with Redis token bucket
Async SQLAlchemy + asyncpg — non-blocking from upload to query
Docker Compose one-command setup with PostgreSQL + pgvector

Project Structure

directory tree
DocQuery/
├── app/
│   ├── main.py               # FastAPI app, lifespan, middleware
│   ├── api/
│   │   ├── documents.py      # Upload, list, delete endpoints
│   │   └── chat.py           # Streaming Q&A endpoint
│   ├── rag/
│   │   ├── chunker.py        # PDF/Word chunking strategy
│   │   ├── embedder.py       # OpenAI embedding calls
│   │   ├── retriever.py      # pgvector + BM25 hybrid search
│   │   └── generator.py      # GPT-4o streaming with citations
│   ├── models/               # SQLAlchemy async models
│   └── db.py                 # Async session factory
├── alembic/                  # pgvector migration
├── docker-compose.yml
└── requirements.txt

Setup Guide

1

Clone and create virtual environment

Set up the Python environment and install dependencies.

bash
git clone https://github.com/asmanasir/DocQuery.git
cd DocQuery
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt
2

Configure environment variables

Set your OpenAI key and database URL.

bash
cp .env.example .env
# Edit .env:
# OPENAI_API_KEY=sk-...
# DATABASE_URL=postgresql+asyncpg://app:password@localhost:5432/docquery
3

Start PostgreSQL with pgvector and run migrations

Start the database and apply the schema.

bash
docker-compose up -d db
alembic upgrade head

Running the Project

1

Start the API

Run FastAPI with uvicorn — docs at /docs.

bash
uvicorn app.main:app --reload
2

Upload a document and ask a question

Upload a PDF and start a Q&A session.

bash
# Upload a document
curl -X POST http://localhost:8000/api/documents \
  -F "file=@report.pdf"

# Ask a question (streaming response)
curl -N http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"question": "What were the key findings?", "session_id": "sess-1"}'

Project Info

CategoryAI & ML
Difficultyintermediate
Setup time1–2 hours to set up
Technologies10 tools

Tech Stack

Python 3.12FastAPILangChainOpenAI (GPT-4o)pgvector (PostgreSQL)SQLAlchemy 2.0 (async)Pydantic v2Redis (rate limiting)DockerGitHub Actions

Prerequisites

  • Python 3.12 installed
  • Docker Desktop installed
  • OpenAI API key (GPT-4o access)
View Source on GitHub
L

Learnixo

Project Author

The gap between 'RAG tutorial' and 'RAG in production' is citations, hybrid search, streaming, and multi-turn memory. This project covers all four — in Python with FastAPI, the stack most AI teams actually use.