Learnixo
Back to blog
AI Systemsintermediate

Why FastAPI for AI Services

Understand what FastAPI is, why it suits AI and LLM workloads, and how it compares to Flask and Django REST Framework. Build your first endpoint and run it with uvicorn.

Asma Hafeez KhanMay 15, 20268 min read
fastapipythonasyncapiai-services
Share:𝕏

What Is FastAPI?

FastAPI is a modern Python web framework for building HTTP APIs. It sits on top of two battle-tested libraries:

  • Starlette β€” an ASGI toolkit that handles routing, middleware, WebSockets, and streaming responses
  • Pydantic β€” a data validation library that uses Python type hints to define and enforce schemas

Because Starlette is ASGI-native, FastAPI is async from the ground up. Every request can be handled by a non-blocking coroutine, which makes it a natural fit for the I/O-heavy workloads typical in AI services β€” waiting for OpenAI, waiting for a vector store, streaming tokens back to a browser.

FastAPI was created by SebastiΓ‘n RamΓ­rez (tiangolo) and first released in 2018. By 2026 it is consistently in the top three most-used Python web frameworks, with especially strong adoption in data science and AI teams.

The ASGI Foundation

Traditional Python web servers (Flask, Django) use WSGI β€” the Web Server Gateway Interface. WSGI is synchronous: one request occupies one thread from start to finish.

ASGI β€” the Asynchronous Server Gateway Interface β€” replaces that model with an event loop. A single OS thread can interleave thousands of in-flight requests. While one request is waiting for a database query, the event loop switches to another request that has data ready.

WSGI (Flask)                ASGI (FastAPI)
─────────────               ───────────────
Thread 1 β†’ Request A        Event loop tick
Thread 2 β†’ Request B           β†’ Request A: send DB query
Thread 3 β†’ Request C           β†’ Request B: send OpenAI call
...blocking...                 β†’ Request C: return cached result
Thread 1 done β†’ free        Event loop tick
                               β†’ Request A: DB result back, process
                               β†’ Request B: first token back, stream

For AI workloads that spend most of their time waiting on network calls (LLM APIs, embedding services, vector databases), ASGI can handle orders of magnitude more concurrent requests with the same hardware.

Why FastAPI Fits AI Services

1. Async-First

Every route in FastAPI can be declared with async def. Awaiting an OpenAI call inside a route is idiomatic, not a workaround.

Python
@app.post("/chat")
async def chat(req: ChatRequest) -> ChatResponse:
    response = await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=req.messages,
    )
    return ChatResponse(content=response.choices[0].message.content)

2. Streaming Support

FastAPI's StreamingResponse plus Python's AsyncGenerator let you stream tokens directly to the client as they arrive from the LLM β€” no buffering, no waiting for the full completion.

3. Type Safety End-to-End

Pydantic models serve triple duty: they validate incoming JSON, they serialize outgoing JSON, and they generate OpenAPI schemas. One type definition, three benefits.

4. Automatic Documentation

FastAPI generates interactive Swagger UI at /docs and ReDoc at /redoc with zero configuration. In AI teams this is invaluable β€” product managers and frontend developers can explore the API without reading source code.

5. Dependency Injection

The Depends() system lets you inject shared resources β€” OpenAI clients, database sessions, the authenticated user β€” without passing them through every function signature.

Comparing FastAPI, Flask, and Django REST Framework

| Feature | Flask | Django REST Framework | FastAPI | |---------|-------|----------------------|---------| | Async routes | Plugin required (Quart) | Limited (Django 4.1+) | Native | | Streaming responses | Manual | Manual | Built-in StreamingResponse | | Request validation | Manual or Marshmallow | Serializers | Pydantic (automatic) | | Auto OpenAPI docs | Flask-RESTX add-on | drf-spectacular add-on | Built-in | | Type hints used at runtime | No | No | Yes (Pydantic) | | Performance (req/sec) | Moderate | Moderate | High | | Learning curve | Low | High | Low–Medium | | Ecosystem maturity | Very high | Very high | High |

Flask is excellent for small services where async is not needed. Django REST Framework shines when you need the full Django ORM, authentication, and admin. FastAPI is the right choice when you need high concurrency, streaming, and schema validation out of the box.

Hello World: FastAPI vs Flask

Flask version

Python
# app_flask.py
from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/hello")
def hello():
    return jsonify({"message": "Hello from Flask"})

if __name__ == "__main__":
    app.run(port=8000)

Run it:

Bash
python app_flask.py

FastAPI version

Python
# app_fastapi.py
from fastapi import FastAPI

app = FastAPI(title="My AI Service", version="1.0.0")

@app.get("/hello")
async def hello() -> dict:
    return {"message": "Hello from FastAPI"}

Run it:

Bash
uvicorn app_fastapi:app --reload --port 8000

Differences you notice immediately:

  1. The route handler is async def β€” you can await inside it
  2. The return type annotation (-> dict) is used by FastAPI to generate the response schema
  3. You start the server with uvicorn, not with the app itself
  4. No jsonify() needed β€” FastAPI serializes the return value automatically

A More Realistic First App

Let's build a small FastAPI application that is closer to a real AI service:

Python
# main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
import httpx
import os

app = FastAPI(
    title="AI Starter Service",
    description="A minimal FastAPI service for AI workloads",
    version="0.1.0",
)


# --- Models ---

class PromptRequest(BaseModel):
    prompt: str = Field(..., min_length=1, max_length=4000)
    max_tokens: int = Field(default=512, ge=1, le=4096)


class PromptResponse(BaseModel):
    response: str
    tokens_used: int


# --- Routes ---

@app.get("/")
async def root() -> dict:
    return {"status": "ok", "service": "AI Starter"}


@app.get("/health")
async def health() -> dict:
    return {"status": "healthy"}


@app.post("/complete", response_model=PromptResponse)
async def complete(req: PromptRequest) -> PromptResponse:
    """
    Send a prompt and receive a completion.
    """
    api_key = os.environ.get("OPENAI_API_KEY")
    if not api_key:
        raise HTTPException(status_code=500, detail="API key not configured")

    async with httpx.AsyncClient() as client:
        r = await client.post(
            "https://api.openai.com/v1/chat/completions",
            headers={"Authorization": f"Bearer {api_key}"},
            json={
                "model": "gpt-4o-mini",
                "messages": [{"role": "user", "content": req.prompt}],
                "max_tokens": req.max_tokens,
            },
            timeout=30.0,
        )
        r.raise_for_status()
        data = r.json()

    text = data["choices"][0]["message"]["content"]
    tokens = data["usage"]["total_tokens"]

    return PromptResponse(response=text, tokens_used=tokens)

Project Layout

A clean FastAPI project for an AI service looks like this:

my_ai_service/
β”œβ”€β”€ main.py              # App factory, mounts routers
β”œβ”€β”€ routers/
β”‚   β”œβ”€β”€ chat.py          # /chat endpoints
β”‚   β”œβ”€β”€ embeddings.py    # /embed endpoints
β”‚   └── health.py        # /health endpoints
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ requests.py      # Pydantic request models
β”‚   └── responses.py     # Pydantic response models
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ openai_client.py # Async OpenAI wrapper
β”‚   └── vector_store.py  # Vector DB wrapper
β”œβ”€β”€ dependencies.py      # Shared Depends() functions
β”œβ”€β”€ config.py            # Settings via pydantic-settings
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
└── tests/
    └── test_chat.py

Installing Dependencies

Bash
pip install fastapi uvicorn[standard] pydantic httpx

uvicorn[standard] installs uvicorn with uvloop (faster event loop) and websockets support.

For a production AI service you will also want:

Bash
pip install openai httpx pydantic-settings python-dotenv

Running with Uvicorn

Bash
# Development β€” auto-reload on file change
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Production β€” multiple worker processes
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

# With gunicorn as the process manager (recommended in production)
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

Note: when using --workers with uvicorn directly, each worker is a separate process with its own event loop. Resources initialised at module level are duplicated per worker. Use the lifespan pattern (covered in a later lesson) to initialise resources correctly.

Exploring the Auto-Generated Docs

After starting the server, open:

  • http://localhost:8000/docs β€” Swagger UI, fully interactive
  • http://localhost:8000/redoc β€” ReDoc, clean read-only reference
  • http://localhost:8000/openapi.json β€” raw OpenAPI 3.1 schema

The Swagger UI lets you fill in the request body, hit "Execute", and see the full HTTP request and response β€” including headers. For AI services, this means:

  • Testers can try prompts without writing any code
  • Frontend developers can see exactly what fields to send
  • You get free contract documentation that stays in sync with the code

Key Takeaways

  • FastAPI is built on Starlette (ASGI) and Pydantic, giving you async routing and automatic validation in one package
  • ASGI's event loop model handles concurrent I/O-bound workloads far more efficiently than WSGI threads β€” exactly the profile of AI service calls
  • Flask is simpler but lacks native async; Django REST Framework is more opinionated but has a steeper setup cost
  • You get Swagger UI at /docs with zero configuration β€” a significant productivity gain for AI teams
  • Start with uvicorn main:app --reload for development, gunicorn with UvicornWorker for production

In the next lesson, you will go deeper into async/await β€” understanding when to use async def, how the event loop works, and how to avoid the most common mistake of blocking the event loop with synchronous code.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.