MCP vs RAG vs AI Agents: What They Actually Are and When to Use Each
Three terms everyone is using, often interchangeably. They solve completely different problems. Here's the mental model that makes them click ā with real architecture diagrams and a production stack example.
The Confusion
If you've been building with LLMs, you've heard all three:
- MCP ā the new thing everyone's excited about
- RAG ā the thing everyone built first
- AI Agents ā the thing everyone wants to build next
They get conflated in the same sentence constantly. But they're not alternatives to each other ā they solve completely different layers of the problem.
Here's the mental model that makes it click:
MCP ā How AI connects to tools and systems
RAG ā How AI gets better, up-to-date knowledge
Agents ā How AI takes multi-step action autonomouslyThey're not competing. In a production AI system, you almost certainly use all three together.
1. RAG ā Retrieval-Augmented Generation
The Problem RAG Solves
LLMs are trained on a fixed dataset with a knowledge cutoff. They don't know:
- Your company's internal documentation
- The support ticket filed this morning
- The contract you uploaded yesterday
- Anything that changed after their training date
The naive fix is to dump everything into the prompt. That fails fast:
GPT-4o context window: ~128K tokens ā ~100,000 words
Your company wiki: 10 million words
Your product docs: 5 million words
Your support logs: 50 million words
ā You cannot fit this in the prompt.What RAG Does
Instead of stuffing all your data into every prompt, RAG fetches only the relevant pieces at query time and injects them.
Without RAG:
User: "What's our refund policy for enterprise customers?"
LLM: [Has no idea ā not in training data]
Output: Hallucinated or generic answer
With RAG:
User: "What's our refund policy for enterprise customers?"
Step 1 ā Retrieve:
Convert question to embedding vector
Search vector database for similar document chunks
Find: "enterprise_contracts.pdf" section 4.2
Step 2 ā Augment:
Inject retrieved chunks into the prompt:
"Context: [4.2 Enterprise Refund Policy: Enterprise customers
may request a full refund within 60 days of contract start...]
Question: What's our refund policy for enterprise customers?"
Step 3 ā Generate:
LLM now has the context it needs ā accurate answerHow RAG Works Under the Hood
Ingestion pipeline (runs once, then incrementally):
Documents (PDFs, Notion, Confluence, S3...)
ā
ā¼
Text Extraction + Chunking
(split into 500-token overlapping chunks)
ā
ā¼
Embedding Model (OpenAI text-embedding-3-small, etc.)
(convert each chunk to a 1536-dimension vector)
ā
ā¼
Vector Database (Pinecone, Qdrant, pgvector, Chroma)
(store chunk text + vector + metadata)
Query pipeline (runs on every user question):
User question
ā
ā¼
Embed question ā query vector
ā
ā¼
Vector DB similarity search ā top 5 relevant chunks
ā
ā¼
Inject chunks into prompt ā LLM ā AnswerWhen RAG Is the Right Tool
- Your LLM needs to answer questions about your specific data
- The data is too large to fit in the context window
- The data changes over time (re-index on update, no retraining)
- You need citations ā show which document the answer came from
When RAG Is NOT the Right Tool
- The question requires live data (current stock price, today's weather) ā use a tool/API instead
- The data fits in the context window ā just send it directly
- You need reasoning across many documents simultaneously ā RAG retrieves chunks, it doesn't reason across the whole corpus
2. AI Agents ā LLMs That Take Action
The Problem Agents Solve
A standard LLM call is one-shot: you ask a question, it answers. That covers maybe 20% of real workflows. The other 80% look like this:
"Check our latest sales data, compare it to last quarter,
identify the top 3 underperforming products, draft a Slack
message to the sales team with recommendations, and
create a Jira ticket to track the action items."A single LLM call cannot do this. It requires:
- Calling a database API to fetch sales data
- Running a comparison calculation
- Reasoning about the results
- Calling the Slack API
- Calling the Jira API
This is a multi-step workflow where the LLM decides what to do next at each step. That's an agent.
The Agentic Loop
User goal: "Analyse sales and notify the team"
ā
ā¼
LLM: What should I do first?
Decision: Call get_sales_data(period="Q1-2026")
ā
ā¼
Tool executes ā returns data
ā
ā¼
LLM: What next?
Decision: Call compare_to_previous(current=..., previous="Q4-2025")
ā
ā¼
Tool executes ā returns comparison
ā
ā¼
LLM: What next?
Decision: Call send_slack_message(channel="#sales", text="...")
ā
ā¼
Tool executes ā success
ā
ā¼
LLM: Goal complete ā return summary to userThe LLM is the reasoning engine that decides what tool to call, with what arguments, in what order. The loop continues until the goal is complete or the LLM decides it's done.
Anatomy of a Tool Call (OpenAI Function Calling)
// Define tools the agent can call
const tools = [
{
type: "function",
function: {
name: "get_sales_data",
description: "Fetch sales data for a given period",
parameters: {
type: "object",
properties: {
period: { type: "string", description: "e.g. Q1-2026" },
product_id: { type: "string", description: "Optional filter" }
},
required: ["period"]
}
}
},
{
type: "function",
function: {
name: "send_slack_message",
description: "Send a message to a Slack channel",
parameters: {
type: "object",
properties: {
channel: { type: "string" },
text: { type: "string" }
},
required: ["channel", "text"]
}
}
}
];
// Agentic loop
async function runAgent(goal: string) {
const messages = [{ role: "user", content: goal }];
while (true) {
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages,
tools,
tool_choice: "auto"
});
const message = response.choices[0].message;
// No more tool calls ā agent is done
if (!message.tool_calls) {
return message.content;
}
// Execute each tool the LLM requested
messages.push(message);
for (const call of message.tool_calls) {
const result = await executeTool(call.function.name, call.function.arguments);
messages.push({
role: "tool",
tool_call_id: call.id,
content: JSON.stringify(result)
});
}
// Loop ā LLM sees the tool results and decides what to do next
}
}When Agents Are the Right Tool
- Tasks require multiple sequential steps where each step depends on the previous
- You need the LLM to make decisions about which path to take based on intermediate results
- Tasks involve writing back to systems (create ticket, send email, update database)
- The workflow is too complex to hardcode as a fixed sequence
When Agents Are NOT the Right Tool
- Single-step tasks ā a standard LLM call with tools is simpler
- Latency-critical paths ā every loop iteration adds 1ā3 seconds
- High-stakes irreversible actions ā agents can take unintended actions; add a human-in-the-loop checkpoint before anything destructive
3. MCP ā Model Context Protocol
The Problem MCP Solves
Agents need tools. Building those tool integrations is the messy part. Every team that builds an agent that connects to Slack has to:
- Write authentication code
- Map Slack's API to function signatures
- Handle rate limits and errors
- Keep it updated when Slack's API changes
Multiply this by every tool (Gmail, GitHub, Notion, Jira, Salesforce, internal APIs) and every team building agents, and you have an explosion of redundant one-off integrations.
MCP is a standard protocol that defines how AI models connect to external tools and data sources ā so you write the integration once, as an MCP server, and any MCP-compatible AI client can use it.
Think of it as USB-C for AI integrations.
Before MCP (integration chaos):
Agent A āāāā custom Slack integration āāāā Slack
Agent A āāāā custom Gmail integration āāāā Gmail
Agent B āāāā different Slack integration āā Slack (duplicated!)
Agent B āāāā custom Notion integration āāā Notion
Agent C āāāā yet another Slack integration Slack (duplicated again)
With MCP (standardised):
Agent A āāā
Agent B āāā¼āāā MCP Client āāāā Slack MCP Server āāāā Slack
Agent C āāā ā
āāāāāāāāāā Gmail MCP Server āāāā Gmail
āāāāāāāāāā Notion MCP Server āāā Notion
āāāāāāāāāā Your API MCP Server āā Internal APIs
Write the integration once. Use it everywhere.What an MCP Server Exposes
An MCP server exposes three types of capabilities:
Tools ā Functions the LLM can call
("send_email", "create_jira_ticket", "query_database")
Resources ā Data the LLM can read
(files, database records, API responses)
Prompts ā Pre-built prompt templates
(reusable instructions for common tasks)MCP in Practice
// A minimal MCP server for a database (TypeScript SDK)
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
const server = new Server(
{ name: "postgres-mcp-server", version: "1.0.0" },
{ capabilities: { tools: {} } }
);
// Expose a tool the LLM can call
server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [
{
name: "query_database",
description: "Run a read-only SQL query against the database",
inputSchema: {
type: "object",
properties: {
sql: { type: "string", description: "SQL SELECT query" }
},
required: ["sql"]
}
}
]
}));
// Handle the tool call when the LLM invokes it
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === "query_database") {
const result = await db.query(request.params.arguments.sql);
return { content: [{ type: "text", text: JSON.stringify(result.rows) }] };
}
});
// Start the server
const transport = new StdioServerTransport();
await server.connect(transport);Any MCP-compatible client (Claude, a custom agent, VS Code with Copilot) can now use this database tool without any custom integration code.
MCP vs Function Calling
A common question: isn't MCP just OpenAI function calling with extra steps?
Function calling:
ā Tools defined inline per API request
ā Tightly coupled to one LLM provider
ā No standard discovery mechanism
ā You write the plumbing for every integration
MCP:
ā Tools defined in a separate server process
ā Provider-agnostic (Claude, GPT-4o, Gemini ā all speak MCP)
ā Standard discovery: client asks server "what can you do?"
ā Write once, reuse across any AI systemMCP is most valuable in organisations building many agents that share common integrations, or when exposing tools to third-party AI clients you don't control.
The Full Picture: All Three Together
Here's a production AI system that uses all three:
User: "Find all support tickets from this week tagged 'billing',
summarise the common issues, and create a Jira epic
to address the top 3 problems."
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā AI Agent (loop) ā
ā ā
ā Step 1: Fetch tickets ā
ā ā calls Zendesk MCP Server (tool: search_tickets) ā ā MCP
ā ā returns 47 tickets ā
ā ā
ā Step 2: Summarise each ticket ā
ā ā For each ticket, needs to understand jargon ā
ā ā RAG: retrieves product glossary from vector DB ā ā RAG
ā ā LLM summarises with correct context ā
ā ā
ā Step 3: Cluster common themes ā
ā ā Pure LLM reasoning over summaries ā
ā ā
ā Step 4: Create Jira epic ā
ā ā calls Jira MCP Server (tool: create_epic) ā ā MCP
ā ā
ā Done ā Returns: "Created epic BILL-423 with 3 stories"ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
MCP provided: Zendesk integration, Jira integration
RAG provided: Product knowledge to interpret ticket content
Agent provided: The multi-step orchestration logicDecision Framework: Which One Do You Need?
"My LLM doesn't know about my data"
ā Add RAG
"My LLM needs to read from / write to external systems"
ā Add tools (function calling or MCP)
"My task requires multiple steps and decisions"
ā Use an agent (orchestrate tools in a loop)
"Multiple agents/teams need the same integrations"
ā Wrap those integrations in MCP servers
All of the above?
ā Build an agent that uses RAG for knowledge
and MCP servers for tool integrationsCommon Mistakes
Mistake 1: Using RAG when you should use a tool
"Our LLM needs to know the current inventory level."
Don't embed and index live inventory data ā it changes every second and will always be stale. Call an inventory API directly as a tool.
RAG is for: documents, policies, knowledge bases (changes infrequently)
Tools are for: live data, write operations, real-time systemsMistake 2: Building an agent for a one-step task
"We're building an agent to answer customer emails."
If the task is: read email ā generate reply ā done. That's one LLM call with a prompt template. Adding an agentic loop adds latency and complexity with no benefit.
Mistake 3: Skipping MCP because "we only have one agent"
"MCP is overkill, we'll just write our Slack integration inline."
That's fine today. But the moment you have two agents that both need Slack, you'll write it twice. MCP pays off faster than most teams expect.
Mistake 4: Agentic loops without guardrails
Agent runs in a loop, sends 200 Slack messages, creates 47 Jira tickets, bills the customer 12 times.
Add maximum iteration limits, confirmation steps before irreversible actions, and dry-run modes for production agents.
The One-Line Summary
| | Problem it solves | When you need it | |--|--|--| | RAG | LLM doesn't know your data | Answering questions from private/large datasets | | MCP | AI integrations are fragmented | Connecting to tools reliably across multiple agents | | Agents | Tasks require multi-step reasoning + action | Complex workflows that need to adapt dynamically |
They're not alternatives. They're layers. Build all three, layer them correctly, and you have a production AI system.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.