Build a Chatbot with Your Prompts — Prompt Engineering | Learnixo

What We're Building

A full-stack AI chatbot with:

Streaming — tokens appear as they're generated, not all at once
Conversation history — the model remembers previous messages
System prompt — gives the bot a persona and rules
React frontend — a clean chat UI with streaming support
Rate limiting — prevents abuse and runaway costs
Token budget — caps maximum spend per session

React (chat UI)
    │ SSE / fetch streaming
    ▼
.NET Minimal API
    ├── Conversation history management
    ├── Token budget check
    └── OpenAI SDK (streaming)
            │
            ▼
        gpt-4o-mini

Backend: .NET Setup

Bash

dotnet new webapi -n AiChatbot
dotnet add package OpenAI
dotnet add package Microsoft.Extensions.Caching.StackExchangeRedis

JSON

// appsettings.json
{
  "OpenAI": {
    "ApiKey": "sk-...",
    "Model":  "gpt-4o-mini",
    "MaxTokensPerSession": 10000,
    "MaxMessagesInHistory": 20
  }
}

// Program.cs
builder.Services.AddSingleton(sp =>
    new OpenAIClient(builder.Configuration["OpenAI:ApiKey"]!));

builder.Services.AddStackExchangeRedisCache(options =>
    options.Configuration = builder.Configuration["Redis:ConnectionString"]);

builder.Services.AddScoped<ChatService>();
builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("chat", opt =>
    {
        opt.PermitLimit = 20;
        opt.Window      = TimeSpan.FromMinutes(1);
    });
});

var app = builder.Build();
app.UseRateLimiter();
app.MapChatEndpoints();
app.Run();

Conversation History Model

The model has no memory between API calls — you must replay the conversation history in every request:

public record ChatMessage(string Role, string Content);   // "user" | "assistant" | "system"

public class ConversationHistory
{
    public string SessionId { get; init; } = Guid.NewGuid().ToString("N");
    public List<ChatMessage> Messages { get; init; } = [];
    public int TotalTokensUsed { get; set; }
}

// ConversationStore.cs — stores history in Redis (survives restarts, works across instances)
public class ConversationStore
{
    private readonly IDistributedCache _cache;
    private static readonly TimeSpan _ttl = TimeSpan.FromHours(1);

    public ConversationStore(IDistributedCache cache) => _cache = cache;

    public async Task<ConversationHistory> GetOrCreateAsync(string sessionId, CancellationToken ct)
    {
        var key  = $"chat:{sessionId}";
        var json = await _cache.GetStringAsync(key, ct);
        return json is null
            ? new ConversationHistory { SessionId = sessionId }
            : JsonSerializer.Deserialize<ConversationHistory>(json)!;
    }

    public async Task SaveAsync(ConversationHistory history, CancellationToken ct)
    {
        var json    = JsonSerializer.Serialize(history);
        var options = new DistributedCacheEntryOptions { SlidingExpiration = _ttl };
        await _cache.SetStringAsync($"chat:{history.SessionId}", json, options, ct);
    }
}

ChatService — The Core

public class ChatService
{
    private readonly OpenAIClient        _openAi;
    private readonly ConversationStore   _store;
    private readonly IConfiguration      _config;
    private readonly ILogger<ChatService> _logger;

    private const string SystemPrompt = """
        You are a helpful assistant for Learnixo, a developer learning platform.
        You specialise in .NET, React, SQL, and AI development.
        
        Rules:
        - Be concise — prefer 3–5 sentences over long essays
        - Always include a code example when explaining technical concepts
        - If asked about a topic outside software development, politely redirect
        - Never make up facts — say "I'm not sure" when you don't know
        """;

    public ChatService(
        OpenAIClient client,
        ConversationStore store,
        IConfiguration config,
        ILogger<ChatService> logger)
    {
        _openAi = client;
        _store  = store;
        _config = config;
        _logger = logger;
    }

    public async IAsyncEnumerable<string> StreamReplyAsync(
        string sessionId,
        string userMessage,
        [EnumeratorCancellation] CancellationToken ct = default)
    {
        var model      = _config["OpenAI:Model"]!;
        var maxTokens  = _config.GetValue<int>("OpenAI:MaxTokensPerSession", 10_000);
        var maxHistory = _config.GetValue<int>("OpenAI:MaxMessagesInHistory", 20);

        // Load conversation history
        var history = await _store.GetOrCreateAsync(sessionId, ct);

        // Token budget guard
        if (history.TotalTokensUsed >= maxTokens)
        {
            yield return "[Session token budget exceeded. Start a new conversation.]";
            yield break;
        }

        // Add user message to history
        history.Messages.Add(new ChatMessage("user", userMessage));

        // Build messages for the API call
        // Keep last N messages to stay within context window
        var recentMessages = history.Messages.TakeLast(maxHistory).ToList();

        var apiMessages = new List<global::OpenAI.Chat.ChatMessage>
        {
            global::OpenAI.Chat.ChatMessage.CreateSystemMessage(SystemPrompt),
        };

        foreach (var msg in recentMessages)
        {
            apiMessages.Add(msg.Role == "user"
                ? global::OpenAI.Chat.ChatMessage.CreateUserMessage(msg.Content)
                : global::OpenAI.Chat.ChatMessage.CreateAssistantMessage(msg.Content));
        }

        var options = new ChatCompletionOptions
        {
            Temperature         = 0.7f,
            MaxOutputTokenCount = 1024,
        };

        // Stream the response
        var fullReply    = new StringBuilder();
        var inputTokens  = 0;
        var outputTokens = 0;

        await foreach (var chunk in _openAi
            .GetChatClient(model)
            .CompleteChatStreamingAsync(apiMessages, options, ct))
        {
            // Capture usage from the final chunk
            if (chunk.Usage is { } usage)
            {
                inputTokens  = usage.InputTokenCount;
                outputTokens = usage.OutputTokenCount;
            }

            foreach (var part in chunk.ContentUpdate)
            {
                fullReply.Append(part.Text);
                yield return part.Text;
            }
        }

        // Persist the assistant reply and update token count
        history.Messages.Add(new ChatMessage("assistant", fullReply.ToString()));
        history.TotalTokensUsed += inputTokens + outputTokens;

        await _store.SaveAsync(history, ct);

        _logger.LogInformation(
            "Chat session {SessionId} — tokens used: {Tokens} (total: {Total})",
            sessionId, inputTokens + outputTokens, history.TotalTokensUsed);
    }

    public async Task ClearHistoryAsync(string sessionId, CancellationToken ct)
    {
        var history = new ConversationHistory { SessionId = sessionId };
        await _store.SaveAsync(history, ct);
    }
}

API Endpoints

// ChatEndpoints.cs
public static class ChatEndpoints
{
    public static void MapChatEndpoints(this IEndpointRouteBuilder app)
    {
        var group = app.MapGroup("/api/chat").RequireRateLimiting("chat");

        // Stream a reply as Server-Sent Events
        group.MapPost("/stream", async (
            ChatRequest request,
            ChatService chatService,
            HttpResponse response,
            CancellationToken ct) =>
        {
            response.Headers.ContentType  = "text/event-stream";
            response.Headers.CacheControl = "no-cache";
            response.Headers.Connection   = "keep-alive";

            await foreach (var token in chatService.StreamReplyAsync(
                request.SessionId, request.Message, ct))
            {
                // SSE format: "data: {token}\n\n"
                var escaped = JsonSerializer.Serialize(token);  // handles newlines
                await response.WriteAsync($"data: {escaped}\n\n", ct);
                await response.Body.FlushAsync(ct);
            }

            await response.WriteAsync("data: [DONE]\n\n", ct);
        });

        // Clear conversation history
        group.MapDelete("/{sessionId}", async (
            string sessionId,
            ChatService chatService,
            CancellationToken ct) =>
        {
            await chatService.ClearHistoryAsync(sessionId, ct);
            return Results.NoContent();
        });
    }
}

public record ChatRequest(string SessionId, string Message);

React Frontend

TSX

// ChatWindow.tsx
import { useState, useRef, useEffect } from "react";

interface Message {
  role: "user" | "assistant";
  content: string;
  streaming?: boolean;
}

export function ChatWindow() {
  const [messages, setMessages]   = useState<Message[]>([]);
  const [input, setInput]         = useState("");
  const [isStreaming, setStreaming] = useState(false);
  const sessionId = useRef(crypto.randomUUID());
  const bottomRef = useRef<HTMLDivElement>(null);

  // Auto-scroll to bottom as tokens arrive
  useEffect(() => {
    bottomRef.current?.scrollIntoView({ behavior: "smooth" });
  }, [messages]);

  async function sendMessage() {
    if (!input.trim() || isStreaming) return;
    const userMsg = input.trim();
    setInput("");

    // Add user message
    setMessages(prev => [...prev, { role: "user", content: userMsg }]);

    // Add placeholder for streaming assistant reply
    setMessages(prev => [...prev, { role: "assistant", content: "", streaming: true }]);
    setStreaming(true);

    try {
      const response = await fetch("/api/chat/stream", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ sessionId: sessionId.current, message: userMsg }),
      });

      const reader  = response.body!.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const lines = decoder.decode(value).split("\n");
        for (const line of lines) {
          if (!line.startsWith("data: ")) continue;
          const data = line.slice(6).trim();
          if (data === "[DONE]") break;

          try {
            const token = JSON.parse(data) as string;
            // Append token to the last (streaming) message
            setMessages(prev => {
              const updated = [...prev];
              updated[updated.length - 1] = {
                ...updated[updated.length - 1],
                content: updated[updated.length - 1].content + token,
              };
              return updated;
            });
          } catch { /* ignore parse errors */ }
        }
      }
    } finally {
      // Mark streaming complete
      setMessages(prev => {
        const updated = [...prev];
        updated[updated.length - 1] = {
          ...updated[updated.length - 1],
          streaming: false,
        };
        return updated;
      });
      setStreaming(false);
    }
  }

  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
      {/* Messages */}
      <div className="flex-1 overflow-y-auto space-y-4 mb-4">
        {messages.map((msg, i) => (
          <div key={i}
            className={`flex ${msg.role === "user" ? "justify-end" : "justify-start"}`}>
            <div className={`rounded-2xl px-4 py-3 max-w-[80%] text-sm
              ${msg.role === "user"
                ? "bg-indigo-600 text-white"
                : "bg-muted text-foreground"}`}>
              {msg.content}
              {msg.streaming && (
                <span className="inline-block w-1.5 h-4 ml-0.5 bg-current animate-pulse" />
              )}
            </div>
          </div>
        ))}
        <div ref={bottomRef} />
      </div>

      {/* Input */}
      <div className="flex gap-2">
        <input
          value={input}
          onChange={e => setInput(e.target.value)}
          onKeyDown={e => e.key === "Enter" && !e.shiftKey && sendMessage()}
          placeholder="Ask anything about .NET, React, or AI..."
          disabled={isStreaming}
          className="flex-1 rounded-xl border border-border bg-card px-4 py-3 text-sm
                     focus:outline-none focus:ring-2 focus:ring-primary"
        />
        <button
          onClick={sendMessage}
          disabled={isStreaming || !input.trim()}
          className="px-5 py-3 rounded-xl bg-indigo-600 text-white font-semibold text-sm
                     hover:bg-indigo-700 disabled:opacity-50 transition-colors">
          {isStreaming ? "..." : "Send"}
        </button>
      </div>
    </div>
  );
}

Cost Controls

// Middleware to track daily spend and cut off expensive sessions
public class TokenBudgetMiddleware
{
    private readonly RequestDelegate _next;
    private readonly IDistributedCache _cache;
    private const int DailyTokenLimit = 500_000;   // per IP

    public async Task InvokeAsync(HttpContext context)
    {
        if (!context.Request.Path.StartsWithSegments("/api/chat"))
        {
            await _next(context);
            return;
        }

        var ip       = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
        var key      = $"daily-tokens:{ip}:{DateTime.UtcNow:yyyyMMdd}";
        var current  = await _cache.GetStringAsync(key) ?? "0";

        if (int.Parse(current) >= DailyTokenLimit)
        {
            context.Response.StatusCode = 429;
            await context.Response.WriteAsJsonAsync(new
            {
                error = "Daily token limit reached. Try again tomorrow."
            });
            return;
        }

        await _next(context);
    }
}

Production Checklist

✅ Stream responses — never make the user wait 10s for a full reply
✅ Store conversation history server-side (Redis) — not in localStorage
✅ Truncate history to last N messages — prevents context window overflow
✅ Token budget per session AND per day — prevents runaway costs
✅ Rate limit the endpoint — 20 requests/minute per IP is a reasonable default
✅ Log token usage per request — you need visibility into costs
✅ Use gpt-4o-mini for chat — it's 16× cheaper and perfectly capable
✅ Handle streaming errors gracefully — show "Something went wrong" not a broken stream
✅ System prompt validation — test it against adversarial inputs before shipping

Key Takeaways

Conversation history must be replayed on every request — the API is stateless
Streaming (SSE) is the correct UX pattern — always stream in chat interfaces
Session storage in Redis — not in-memory (survives restarts) and not in the browser (secure)
Token budgets are essential — a single runaway session can consume your monthly budget
gpt-4o-mini is the right model for chat — save gpt-4o for tasks that actually need it
Rate limit aggressively — the OpenAI API charges for every token, abusers are expensive
The system prompt is your product — test it with adversarial inputs before going live