Streaming AI Responses in .NET — SSE and Real-Time Output
Stream AI responses to clients in ASP.NET Core: Server-Sent Events (SSE), streaming from OpenAI/Semantic Kernel, IAsyncEnumerable patterns, and building a real-time AI copilot UI.
Why Streaming Matters
Non-streaming AI response:
1. User asks a question
2. API sends the full prompt to OpenAI
3. OpenAI generates 200 tokens (5-10 seconds)
4. API receives the complete response
5. API sends the complete response to the user
User experience: blank screen for 5-10 seconds, then full answer appears
Streaming AI response:
1. User asks a question
2. API sends prompt to OpenAI with stream=true
3. OpenAI starts sending tokens as they are generated
4. API forwards tokens to the client immediately
5. User sees the response appearing token by token (like typing)
User experience: response appears within 200ms, builds gradually
Streaming is almost always better for AI copilot UIs.
The perceived latency is dramatically lower even when total time is the same.Streaming with Semantic Kernel
// GetStreamingChatMessageContentsAsync returns IAsyncEnumerable<StreamingChatMessageContent>
public sealed class PrescriptionCopilotService
{
private readonly Kernel _kernel;
private readonly IChatCompletionService _chat;
public IAsyncEnumerable<string> StreamResponseAsync(
string question,
ChatHistory history,
CancellationToken ct)
{
history.AddUserMessage(question);
var settings = new OpenAIPromptExecutionSettings
{
Temperature = 0.2,
MaxTokens = 500,
ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
};
return _chat
.GetStreamingChatMessageContentsAsync(history, settings, _kernel, ct)
.Select(chunk => chunk.Content ?? string.Empty)
.Where(content => !string.IsNullOrEmpty(content));
}
}ASP.NET Core SSE Endpoint
// Server-Sent Events (SSE): text/event-stream — native browser support
// Client-side: EventSource API or fetch with ReadableStream
app.MapPost("/api/copilot/stream", async (
CopilotQuestion request,
PrescriptionCopilotService copilot,
HttpResponse response,
CancellationToken ct) =>
{
response.Headers.ContentType = "text/event-stream";
response.Headers.CacheControl = "no-cache";
response.Headers.Connection = "keep-alive";
var history = new ChatHistory(SystemPrompt);
await foreach (var chunk in copilot.StreamResponseAsync(request.Question, history, ct))
{
// SSE format: "data: <content>\n\n"
await response.WriteAsync($"data: {JsonSerializer.Serialize(chunk)}\n\n", ct);
await response.Body.FlushAsync(ct); // important: flush each chunk immediately
}
// Signal end of stream
await response.WriteAsync("data: [DONE]\n\n", ct);
await response.Body.FlushAsync(ct);
});
public sealed record CopilotQuestion(string Question);Consuming SSE in React/TypeScript
// Client-side: consume the SSE stream with fetch + ReadableStream
const streamCopilotResponse = async (
question: string,
onChunk: (chunk: string) => void,
onDone: () => void
) => {
const response = await fetch('/api/copilot/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ question })
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split('\n\n').filter(l => l.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6); // remove "data: " prefix
if (data === '[DONE]') { onDone(); return; }
onChunk(JSON.parse(data)); // append to displayed text
}
}
};
// React component:
const [response, setResponse] = useState('');
const [loading, setLoading] = useState(false);
const ask = async (question: string) => {
setLoading(true);
setResponse('');
await streamCopilotResponse(
question,
(chunk) => setResponse(prev => prev + chunk), // append each chunk
() => setLoading(false)
);
};Minimal API with IResult Streaming
// Alternative: use Results.Stream or write directly to response
// For simpler cases without SSE headers
app.MapPost("/api/copilot/stream-simple", (
CopilotQuestion request,
PrescriptionCopilotService copilot,
CancellationToken ct) =>
{
// Return an IResult that streams to the response
return Results.Stream(
async stream =>
{
var writer = new StreamWriter(stream, leaveOpen: true);
await foreach (var chunk in copilot.StreamResponseAsync(request.Question, new ChatHistory(SystemPrompt), ct))
{
await writer.WriteAsync(chunk);
await writer.FlushAsync();
}
},
contentType: "text/plain; charset=utf-8");
});Handling Streaming with Function Calls
// When function calling is enabled, Semantic Kernel executes functions
// during streaming — the stream pauses while functions run
// To handle this gracefully, track when function calls happen:
public async IAsyncEnumerable<string> StreamWithStatusAsync(
string question,
ChatHistory history,
[EnumeratorCancellation] CancellationToken ct)
{
history.AddUserMessage(question);
var settings = new OpenAIPromptExecutionSettings
{
ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions,
Temperature = 0.2
};
var lastFunctionCalled = string.Empty;
await foreach (var chunk in _chat.GetStreamingChatMessageContentsAsync(
history, settings, _kernel, ct))
{
// Emit status when a function is being called
if (chunk.InnerContent is StreamingChatCompletionUpdate update
&& update.ToolCallUpdates?.Any() == true)
{
var functionName = update.ToolCallUpdates[0].FunctionName;
if (!string.IsNullOrEmpty(functionName) && functionName != lastFunctionCalled)
{
lastFunctionCalled = functionName;
yield return $"\n[Checking {FormatFunctionName(functionName)}...]\n";
}
}
if (!string.IsNullOrEmpty(chunk.Content))
yield return chunk.Content;
}
}
private static string FormatFunctionName(string name) =>
name.Replace("_", " ").Replace("get ", "").Trim();
// "get_latest_inr" → "latest inr"Production issue I've seen: A team implemented streaming but didn't call
FlushAsync()after each chunk. The response buffer accumulated all chunks before sending — the client received the full response at the end, not progressively. The SSE endpoint looked correct in code but behaved identically to non-streaming from the user's perspective. The fix was addingawait response.Body.FlushAsync(ct)after eachWriteAsync. This is the most common streaming implementation mistake in ASP.NET Core — buffering middleware (Response compression, anti-forgery) can also swallow the flush; disable response buffering for SSE endpoints.
Key Takeaway
Streaming AI responses dramatically improves perceived latency — the user sees output within milliseconds instead of waiting 5-10 seconds for a complete response. Use
GetStreamingChatMessageContentsAsyncin Semantic Kernel and pipe chunks to the client via SSE (text/event-stream). Always callFlushAsync()after each chunk — without it, buffering middleware accumulates all chunks before sending. When function calling is enabled, streams pause during function execution — emit status messages to keep the UI responsive. SSE is simple and has native browser support viaEventSourceorfetchwithReadableStream.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.