Back to blog
AI Systemsintermediate

MCP vs RAG vs AI Agents: What They Actually Are and When to Use Each

Three terms everyone is using, often interchangeably. They solve completely different problems. Here's the mental model that makes them click — with real architecture diagrams and a production stack example.

SystemForgeApril 20, 202612 min read
MCPRAGAI AgentsLLMAI EngineeringSystem DesignOpenAI
Share:š•

The Confusion

If you've been building with LLMs, you've heard all three:

  • MCP — the new thing everyone's excited about
  • RAG — the thing everyone built first
  • AI Agents — the thing everyone wants to build next

They get conflated in the same sentence constantly. But they're not alternatives to each other — they solve completely different layers of the problem.

Here's the mental model that makes it click:

MCP    → How AI connects to tools and systems
RAG    → How AI gets better, up-to-date knowledge  
Agents → How AI takes multi-step action autonomously

They're not competing. In a production AI system, you almost certainly use all three together.


1. RAG — Retrieval-Augmented Generation

The Problem RAG Solves

LLMs are trained on a fixed dataset with a knowledge cutoff. They don't know:

  • Your company's internal documentation
  • The support ticket filed this morning
  • The contract you uploaded yesterday
  • Anything that changed after their training date

The naive fix is to dump everything into the prompt. That fails fast:

GPT-4o context window: ~128K tokens ā‰ˆ ~100,000 words

Your company wiki: 10 million words
Your product docs:  5 million words
Your support logs:  50 million words

→ You cannot fit this in the prompt.

What RAG Does

Instead of stuffing all your data into every prompt, RAG fetches only the relevant pieces at query time and injects them.

Without RAG:
  User: "What's our refund policy for enterprise customers?"
  LLM: [Has no idea — not in training data]
  Output: Hallucinated or generic answer

With RAG:
  User: "What's our refund policy for enterprise customers?"
  
  Step 1 — Retrieve:
    Convert question to embedding vector
    Search vector database for similar document chunks
    Find: "enterprise_contracts.pdf" section 4.2

  Step 2 — Augment:
    Inject retrieved chunks into the prompt:
    "Context: [4.2 Enterprise Refund Policy: Enterprise customers
    may request a full refund within 60 days of contract start...]
    Question: What's our refund policy for enterprise customers?"

  Step 3 — Generate:
    LLM now has the context it needs → accurate answer

How RAG Works Under the Hood

Ingestion pipeline (runs once, then incrementally):

  Documents (PDFs, Notion, Confluence, S3...)
       │
       ā–¼
  Text Extraction + Chunking
  (split into 500-token overlapping chunks)
       │
       ā–¼
  Embedding Model (OpenAI text-embedding-3-small, etc.)
  (convert each chunk to a 1536-dimension vector)
       │
       ā–¼
  Vector Database (Pinecone, Qdrant, pgvector, Chroma)
  (store chunk text + vector + metadata)

Query pipeline (runs on every user question):

  User question
       │
       ā–¼
  Embed question → query vector
       │
       ā–¼
  Vector DB similarity search → top 5 relevant chunks
       │
       ā–¼
  Inject chunks into prompt → LLM → Answer

When RAG Is the Right Tool

  • Your LLM needs to answer questions about your specific data
  • The data is too large to fit in the context window
  • The data changes over time (re-index on update, no retraining)
  • You need citations — show which document the answer came from

When RAG Is NOT the Right Tool

  • The question requires live data (current stock price, today's weather) — use a tool/API instead
  • The data fits in the context window — just send it directly
  • You need reasoning across many documents simultaneously — RAG retrieves chunks, it doesn't reason across the whole corpus

2. AI Agents — LLMs That Take Action

The Problem Agents Solve

A standard LLM call is one-shot: you ask a question, it answers. That covers maybe 20% of real workflows. The other 80% look like this:

"Check our latest sales data, compare it to last quarter,
identify the top 3 underperforming products, draft a Slack
message to the sales team with recommendations, and
create a Jira ticket to track the action items."

A single LLM call cannot do this. It requires:

  1. Calling a database API to fetch sales data
  2. Running a comparison calculation
  3. Reasoning about the results
  4. Calling the Slack API
  5. Calling the Jira API

This is a multi-step workflow where the LLM decides what to do next at each step. That's an agent.

The Agentic Loop

User goal: "Analyse sales and notify the team"
       │
       ā–¼
  LLM: What should I do first?
  Decision: Call get_sales_data(period="Q1-2026")
       │
       ā–¼
  Tool executes → returns data
       │
       ā–¼
  LLM: What next?
  Decision: Call compare_to_previous(current=..., previous="Q4-2025")
       │
       ā–¼
  Tool executes → returns comparison
       │
       ā–¼
  LLM: What next?
  Decision: Call send_slack_message(channel="#sales", text="...")
       │
       ā–¼
  Tool executes → success
       │
       ā–¼
  LLM: Goal complete → return summary to user

The LLM is the reasoning engine that decides what tool to call, with what arguments, in what order. The loop continues until the goal is complete or the LLM decides it's done.

Anatomy of a Tool Call (OpenAI Function Calling)

TYPESCRIPT
// Define tools the agent can call
const tools = [
  {
    type: "function",
    function: {
      name: "get_sales_data",
      description: "Fetch sales data for a given period",
      parameters: {
        type: "object",
        properties: {
          period: { type: "string", description: "e.g. Q1-2026" },
          product_id: { type: "string", description: "Optional filter" }
        },
        required: ["period"]
      }
    }
  },
  {
    type: "function",
    function: {
      name: "send_slack_message",
      description: "Send a message to a Slack channel",
      parameters: {
        type: "object",
        properties: {
          channel: { type: "string" },
          text: { type: "string" }
        },
        required: ["channel", "text"]
      }
    }
  }
];

// Agentic loop
async function runAgent(goal: string) {
  const messages = [{ role: "user", content: goal }];

  while (true) {
    const response = await openai.chat.completions.create({
      model: "gpt-4o",
      messages,
      tools,
      tool_choice: "auto"
    });

    const message = response.choices[0].message;

    // No more tool calls → agent is done
    if (!message.tool_calls) {
      return message.content;
    }

    // Execute each tool the LLM requested
    messages.push(message);
    for (const call of message.tool_calls) {
      const result = await executeTool(call.function.name, call.function.arguments);
      messages.push({
        role: "tool",
        tool_call_id: call.id,
        content: JSON.stringify(result)
      });
    }
    // Loop — LLM sees the tool results and decides what to do next
  }
}

When Agents Are the Right Tool

  • Tasks require multiple sequential steps where each step depends on the previous
  • You need the LLM to make decisions about which path to take based on intermediate results
  • Tasks involve writing back to systems (create ticket, send email, update database)
  • The workflow is too complex to hardcode as a fixed sequence

When Agents Are NOT the Right Tool

  • Single-step tasks — a standard LLM call with tools is simpler
  • Latency-critical paths — every loop iteration adds 1–3 seconds
  • High-stakes irreversible actions — agents can take unintended actions; add a human-in-the-loop checkpoint before anything destructive

3. MCP — Model Context Protocol

The Problem MCP Solves

Agents need tools. Building those tool integrations is the messy part. Every team that builds an agent that connects to Slack has to:

  • Write authentication code
  • Map Slack's API to function signatures
  • Handle rate limits and errors
  • Keep it updated when Slack's API changes

Multiply this by every tool (Gmail, GitHub, Notion, Jira, Salesforce, internal APIs) and every team building agents, and you have an explosion of redundant one-off integrations.

MCP is a standard protocol that defines how AI models connect to external tools and data sources — so you write the integration once, as an MCP server, and any MCP-compatible AI client can use it.

Think of it as USB-C for AI integrations.

Before MCP (integration chaos):

  Agent A ──── custom Slack integration ──── Slack
  Agent A ──── custom Gmail integration ──── Gmail
  Agent B ──── different Slack integration ── Slack   (duplicated!)
  Agent B ──── custom Notion integration ─── Notion
  Agent C ──── yet another Slack integration  Slack   (duplicated again)

With MCP (standardised):

  Agent A ──┐
  Agent B ──┼─── MCP Client ──── Slack MCP Server ──── Slack
  Agent C ā”€ā”€ā”˜        │
                     ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ Gmail MCP Server ──── Gmail
                     ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ Notion MCP Server ─── Notion
                     └───────── Your API MCP Server ── Internal APIs

  Write the integration once. Use it everywhere.

What an MCP Server Exposes

An MCP server exposes three types of capabilities:

Tools      → Functions the LLM can call
             ("send_email", "create_jira_ticket", "query_database")

Resources  → Data the LLM can read
             (files, database records, API responses)

Prompts    → Pre-built prompt templates
             (reusable instructions for common tasks)

MCP in Practice

TYPESCRIPT
// A minimal MCP server for a database (TypeScript SDK)
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const server = new Server(
  { name: "postgres-mcp-server", version: "1.0.0" },
  { capabilities: { tools: {} } }
);

// Expose a tool the LLM can call
server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: "query_database",
      description: "Run a read-only SQL query against the database",
      inputSchema: {
        type: "object",
        properties: {
          sql: { type: "string", description: "SQL SELECT query" }
        },
        required: ["sql"]
      }
    }
  ]
}));

// Handle the tool call when the LLM invokes it
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === "query_database") {
    const result = await db.query(request.params.arguments.sql);
    return { content: [{ type: "text", text: JSON.stringify(result.rows) }] };
  }
});

// Start the server
const transport = new StdioServerTransport();
await server.connect(transport);

Any MCP-compatible client (Claude, a custom agent, VS Code with Copilot) can now use this database tool without any custom integration code.

MCP vs Function Calling

A common question: isn't MCP just OpenAI function calling with extra steps?

Function calling:
  → Tools defined inline per API request
  → Tightly coupled to one LLM provider
  → No standard discovery mechanism
  → You write the plumbing for every integration

MCP:
  → Tools defined in a separate server process
  → Provider-agnostic (Claude, GPT-4o, Gemini — all speak MCP)
  → Standard discovery: client asks server "what can you do?"
  → Write once, reuse across any AI system

MCP is most valuable in organisations building many agents that share common integrations, or when exposing tools to third-party AI clients you don't control.


The Full Picture: All Three Together

Here's a production AI system that uses all three:

User: "Find all support tickets from this week tagged 'billing',
       summarise the common issues, and create a Jira epic
       to address the top 3 problems."

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│                     AI Agent (loop)                     │
│                                                         │
│  Step 1: Fetch tickets                                  │
│  → calls Zendesk MCP Server (tool: search_tickets)     │  ← MCP
│  → returns 47 tickets                                   │
│                                                         │
│  Step 2: Summarise each ticket                         │
│  → For each ticket, needs to understand jargon         │
│  → RAG: retrieves product glossary from vector DB      │  ← RAG
│  → LLM summarises with correct context                 │
│                                                         │
│  Step 3: Cluster common themes                         │
│  → Pure LLM reasoning over summaries                   │
│                                                         │
│  Step 4: Create Jira epic                              │
│  → calls Jira MCP Server (tool: create_epic)           │  ← MCP
│                                                         │
│  Done → Returns: "Created epic BILL-423 with 3 stories"│
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

MCP provided: Zendesk integration, Jira integration
RAG provided: Product knowledge to interpret ticket content
Agent provided: The multi-step orchestration logic

Decision Framework: Which One Do You Need?

"My LLM doesn't know about my data"
  → Add RAG

"My LLM needs to read from / write to external systems"
  → Add tools (function calling or MCP)

"My task requires multiple steps and decisions"
  → Use an agent (orchestrate tools in a loop)

"Multiple agents/teams need the same integrations"
  → Wrap those integrations in MCP servers

All of the above?
  → Build an agent that uses RAG for knowledge
    and MCP servers for tool integrations

Common Mistakes

Mistake 1: Using RAG when you should use a tool

"Our LLM needs to know the current inventory level."

Don't embed and index live inventory data — it changes every second and will always be stale. Call an inventory API directly as a tool.

RAG is for:    documents, policies, knowledge bases (changes infrequently)
Tools are for: live data, write operations, real-time systems

Mistake 2: Building an agent for a one-step task

"We're building an agent to answer customer emails."

If the task is: read email → generate reply → done. That's one LLM call with a prompt template. Adding an agentic loop adds latency and complexity with no benefit.

Mistake 3: Skipping MCP because "we only have one agent"

"MCP is overkill, we'll just write our Slack integration inline."

That's fine today. But the moment you have two agents that both need Slack, you'll write it twice. MCP pays off faster than most teams expect.

Mistake 4: Agentic loops without guardrails

Agent runs in a loop, sends 200 Slack messages, creates 47 Jira tickets, bills the customer 12 times.

Add maximum iteration limits, confirmation steps before irreversible actions, and dry-run modes for production agents.


The One-Line Summary

| | Problem it solves | When you need it | |--|--|--| | RAG | LLM doesn't know your data | Answering questions from private/large datasets | | MCP | AI integrations are fragmented | Connecting to tools reliably across multiple agents | | Agents | Tasks require multi-step reasoning + action | Complex workflows that need to adapt dynamically |

They're not alternatives. They're layers. Build all three, layer them correctly, and you have a production AI system.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:š•

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.