Prompt Engineering · Lesson 4 of 4
Build a Chatbot with Your Prompts
What We're Building
A full-stack AI chatbot with:
- Streaming — tokens appear as they're generated, not all at once
- Conversation history — the model remembers previous messages
- System prompt — gives the bot a persona and rules
- React frontend — a clean chat UI with streaming support
- Rate limiting — prevents abuse and runaway costs
- Token budget — caps maximum spend per session
React (chat UI)
│ SSE / fetch streaming
▼
.NET Minimal API
├── Conversation history management
├── Token budget check
└── OpenAI SDK (streaming)
│
▼
gpt-4o-miniBackend: .NET Setup
Bash
dotnet new webapi -n AiChatbot
dotnet add package OpenAI
dotnet add package Microsoft.Extensions.Caching.StackExchangeRedisJSON
// appsettings.json
{
"OpenAI": {
"ApiKey": "sk-...",
"Model": "gpt-4o-mini",
"MaxTokensPerSession": 10000,
"MaxMessagesInHistory": 20
}
}C#
// Program.cs
builder.Services.AddSingleton(sp =>
new OpenAIClient(builder.Configuration["OpenAI:ApiKey"]!));
builder.Services.AddStackExchangeRedisCache(options =>
options.Configuration = builder.Configuration["Redis:ConnectionString"]);
builder.Services.AddScoped<ChatService>();
builder.Services.AddRateLimiter(options =>
{
options.AddFixedWindowLimiter("chat", opt =>
{
opt.PermitLimit = 20;
opt.Window = TimeSpan.FromMinutes(1);
});
});
var app = builder.Build();
app.UseRateLimiter();
app.MapChatEndpoints();
app.Run();Conversation History Model
The model has no memory between API calls — you must replay the conversation history in every request:
C#
public record ChatMessage(string Role, string Content); // "user" | "assistant" | "system"
public class ConversationHistory
{
public string SessionId { get; init; } = Guid.NewGuid().ToString("N");
public List<ChatMessage> Messages { get; init; } = [];
public int TotalTokensUsed { get; set; }
}C#
// ConversationStore.cs — stores history in Redis (survives restarts, works across instances)
public class ConversationStore
{
private readonly IDistributedCache _cache;
private static readonly TimeSpan _ttl = TimeSpan.FromHours(1);
public ConversationStore(IDistributedCache cache) => _cache = cache;
public async Task<ConversationHistory> GetOrCreateAsync(string sessionId, CancellationToken ct)
{
var key = $"chat:{sessionId}";
var json = await _cache.GetStringAsync(key, ct);
return json is null
? new ConversationHistory { SessionId = sessionId }
: JsonSerializer.Deserialize<ConversationHistory>(json)!;
}
public async Task SaveAsync(ConversationHistory history, CancellationToken ct)
{
var json = JsonSerializer.Serialize(history);
var options = new DistributedCacheEntryOptions { SlidingExpiration = _ttl };
await _cache.SetStringAsync($"chat:{history.SessionId}", json, options, ct);
}
}ChatService — The Core
C#
public class ChatService
{
private readonly OpenAIClient _openAi;
private readonly ConversationStore _store;
private readonly IConfiguration _config;
private readonly ILogger<ChatService> _logger;
private const string SystemPrompt = """
You are a helpful assistant for Learnixo, a developer learning platform.
You specialise in .NET, React, SQL, and AI development.
Rules:
- Be concise — prefer 3–5 sentences over long essays
- Always include a code example when explaining technical concepts
- If asked about a topic outside software development, politely redirect
- Never make up facts — say "I'm not sure" when you don't know
""";
public ChatService(
OpenAIClient client,
ConversationStore store,
IConfiguration config,
ILogger<ChatService> logger)
{
_openAi = client;
_store = store;
_config = config;
_logger = logger;
}
public async IAsyncEnumerable<string> StreamReplyAsync(
string sessionId,
string userMessage,
[EnumeratorCancellation] CancellationToken ct = default)
{
var model = _config["OpenAI:Model"]!;
var maxTokens = _config.GetValue<int>("OpenAI:MaxTokensPerSession", 10_000);
var maxHistory = _config.GetValue<int>("OpenAI:MaxMessagesInHistory", 20);
// Load conversation history
var history = await _store.GetOrCreateAsync(sessionId, ct);
// Token budget guard
if (history.TotalTokensUsed >= maxTokens)
{
yield return "[Session token budget exceeded. Start a new conversation.]";
yield break;
}
// Add user message to history
history.Messages.Add(new ChatMessage("user", userMessage));
// Build messages for the API call
// Keep last N messages to stay within context window
var recentMessages = history.Messages.TakeLast(maxHistory).ToList();
var apiMessages = new List<global::OpenAI.Chat.ChatMessage>
{
global::OpenAI.Chat.ChatMessage.CreateSystemMessage(SystemPrompt),
};
foreach (var msg in recentMessages)
{
apiMessages.Add(msg.Role == "user"
? global::OpenAI.Chat.ChatMessage.CreateUserMessage(msg.Content)
: global::OpenAI.Chat.ChatMessage.CreateAssistantMessage(msg.Content));
}
var options = new ChatCompletionOptions
{
Temperature = 0.7f,
MaxOutputTokenCount = 1024,
};
// Stream the response
var fullReply = new StringBuilder();
var inputTokens = 0;
var outputTokens = 0;
await foreach (var chunk in _openAi
.GetChatClient(model)
.CompleteChatStreamingAsync(apiMessages, options, ct))
{
// Capture usage from the final chunk
if (chunk.Usage is { } usage)
{
inputTokens = usage.InputTokenCount;
outputTokens = usage.OutputTokenCount;
}
foreach (var part in chunk.ContentUpdate)
{
fullReply.Append(part.Text);
yield return part.Text;
}
}
// Persist the assistant reply and update token count
history.Messages.Add(new ChatMessage("assistant", fullReply.ToString()));
history.TotalTokensUsed += inputTokens + outputTokens;
await _store.SaveAsync(history, ct);
_logger.LogInformation(
"Chat session {SessionId} — tokens used: {Tokens} (total: {Total})",
sessionId, inputTokens + outputTokens, history.TotalTokensUsed);
}
public async Task ClearHistoryAsync(string sessionId, CancellationToken ct)
{
var history = new ConversationHistory { SessionId = sessionId };
await _store.SaveAsync(history, ct);
}
}API Endpoints
C#
// ChatEndpoints.cs
public static class ChatEndpoints
{
public static void MapChatEndpoints(this IEndpointRouteBuilder app)
{
var group = app.MapGroup("/api/chat").RequireRateLimiting("chat");
// Stream a reply as Server-Sent Events
group.MapPost("/stream", async (
ChatRequest request,
ChatService chatService,
HttpResponse response,
CancellationToken ct) =>
{
response.Headers.ContentType = "text/event-stream";
response.Headers.CacheControl = "no-cache";
response.Headers.Connection = "keep-alive";
await foreach (var token in chatService.StreamReplyAsync(
request.SessionId, request.Message, ct))
{
// SSE format: "data: {token}\n\n"
var escaped = JsonSerializer.Serialize(token); // handles newlines
await response.WriteAsync($"data: {escaped}\n\n", ct);
await response.Body.FlushAsync(ct);
}
await response.WriteAsync("data: [DONE]\n\n", ct);
});
// Clear conversation history
group.MapDelete("/{sessionId}", async (
string sessionId,
ChatService chatService,
CancellationToken ct) =>
{
await chatService.ClearHistoryAsync(sessionId, ct);
return Results.NoContent();
});
}
}
public record ChatRequest(string SessionId, string Message);React Frontend
TSX
// ChatWindow.tsx
import { useState, useRef, useEffect } from "react";
interface Message {
role: "user" | "assistant";
content: string;
streaming?: boolean;
}
export function ChatWindow() {
const [messages, setMessages] = useState<Message[]>([]);
const [input, setInput] = useState("");
const [isStreaming, setStreaming] = useState(false);
const sessionId = useRef(crypto.randomUUID());
const bottomRef = useRef<HTMLDivElement>(null);
// Auto-scroll to bottom as tokens arrive
useEffect(() => {
bottomRef.current?.scrollIntoView({ behavior: "smooth" });
}, [messages]);
async function sendMessage() {
if (!input.trim() || isStreaming) return;
const userMsg = input.trim();
setInput("");
// Add user message
setMessages(prev => [...prev, { role: "user", content: userMsg }]);
// Add placeholder for streaming assistant reply
setMessages(prev => [...prev, { role: "assistant", content: "", streaming: true }]);
setStreaming(true);
try {
const response = await fetch("/api/chat/stream", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ sessionId: sessionId.current, message: userMsg }),
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const lines = decoder.decode(value).split("\n");
for (const line of lines) {
if (!line.startsWith("data: ")) continue;
const data = line.slice(6).trim();
if (data === "[DONE]") break;
try {
const token = JSON.parse(data) as string;
// Append token to the last (streaming) message
setMessages(prev => {
const updated = [...prev];
updated[updated.length - 1] = {
...updated[updated.length - 1],
content: updated[updated.length - 1].content + token,
};
return updated;
});
} catch { /* ignore parse errors */ }
}
}
} finally {
// Mark streaming complete
setMessages(prev => {
const updated = [...prev];
updated[updated.length - 1] = {
...updated[updated.length - 1],
streaming: false,
};
return updated;
});
setStreaming(false);
}
}
return (
<div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
{/* Messages */}
<div className="flex-1 overflow-y-auto space-y-4 mb-4">
{messages.map((msg, i) => (
<div key={i}
className={`flex ${msg.role === "user" ? "justify-end" : "justify-start"}`}>
<div className={`rounded-2xl px-4 py-3 max-w-[80%] text-sm
${msg.role === "user"
? "bg-indigo-600 text-white"
: "bg-muted text-foreground"}`}>
{msg.content}
{msg.streaming && (
<span className="inline-block w-1.5 h-4 ml-0.5 bg-current animate-pulse" />
)}
</div>
</div>
))}
<div ref={bottomRef} />
</div>
{/* Input */}
<div className="flex gap-2">
<input
value={input}
onChange={e => setInput(e.target.value)}
onKeyDown={e => e.key === "Enter" && !e.shiftKey && sendMessage()}
placeholder="Ask anything about .NET, React, or AI..."
disabled={isStreaming}
className="flex-1 rounded-xl border border-border bg-card px-4 py-3 text-sm
focus:outline-none focus:ring-2 focus:ring-primary"
/>
<button
onClick={sendMessage}
disabled={isStreaming || !input.trim()}
className="px-5 py-3 rounded-xl bg-indigo-600 text-white font-semibold text-sm
hover:bg-indigo-700 disabled:opacity-50 transition-colors">
{isStreaming ? "..." : "Send"}
</button>
</div>
</div>
);
}Cost Controls
C#
// Middleware to track daily spend and cut off expensive sessions
public class TokenBudgetMiddleware
{
private readonly RequestDelegate _next;
private readonly IDistributedCache _cache;
private const int DailyTokenLimit = 500_000; // per IP
public async Task InvokeAsync(HttpContext context)
{
if (!context.Request.Path.StartsWithSegments("/api/chat"))
{
await _next(context);
return;
}
var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
var key = $"daily-tokens:{ip}:{DateTime.UtcNow:yyyyMMdd}";
var current = await _cache.GetStringAsync(key) ?? "0";
if (int.Parse(current) >= DailyTokenLimit)
{
context.Response.StatusCode = 429;
await context.Response.WriteAsJsonAsync(new
{
error = "Daily token limit reached. Try again tomorrow."
});
return;
}
await _next(context);
}
}Production Checklist
✅ Stream responses — never make the user wait 10s for a full reply
✅ Store conversation history server-side (Redis) — not in localStorage
✅ Truncate history to last N messages — prevents context window overflow
✅ Token budget per session AND per day — prevents runaway costs
✅ Rate limit the endpoint — 20 requests/minute per IP is a reasonable default
✅ Log token usage per request — you need visibility into costs
✅ Use gpt-4o-mini for chat — it's 16× cheaper and perfectly capable
✅ Handle streaming errors gracefully — show "Something went wrong" not a broken stream
✅ System prompt validation — test it against adversarial inputs before shippingKey Takeaways
- Conversation history must be replayed on every request — the API is stateless
- Streaming (SSE) is the correct UX pattern — always stream in chat interfaces
- Session storage in Redis — not in-memory (survives restarts) and not in the browser (secure)
- Token budgets are essential — a single runaway session can consume your monthly budget
- gpt-4o-mini is the right model for chat — save gpt-4o for tasks that actually need it
- Rate limit aggressively — the OpenAI API charges for every token, abusers are expensive
- The system prompt is your product — test it with adversarial inputs before going live
Lesson Checkpoint
Quick CheckQuestion 1 of 4
Why must you send the full conversation history with every API call?