Streaming AI Responses in .NET — SSE and Real-Time Output

Why Streaming Matters

Non-streaming AI response:
  1. User asks a question
  2. API sends the full prompt to OpenAI
  3. OpenAI generates 200 tokens (5-10 seconds)
  4. API receives the complete response
  5. API sends the complete response to the user
  User experience: blank screen for 5-10 seconds, then full answer appears

Streaming AI response:
  1. User asks a question
  2. API sends prompt to OpenAI with stream=true
  3. OpenAI starts sending tokens as they are generated
  4. API forwards tokens to the client immediately
  5. User sees the response appearing token by token (like typing)
  User experience: response appears within 200ms, builds gradually

Streaming is almost always better for AI copilot UIs.
The perceived latency is dramatically lower even when total time is the same.

Streaming with Semantic Kernel

// GetStreamingChatMessageContentsAsync returns IAsyncEnumerable<StreamingChatMessageContent>

public sealed class PrescriptionCopilotService
{
    private readonly Kernel _kernel;
    private readonly IChatCompletionService _chat;

    public IAsyncEnumerable<string> StreamResponseAsync(
        string question,
        ChatHistory history,
        CancellationToken ct)
    {
        history.AddUserMessage(question);

        var settings = new OpenAIPromptExecutionSettings
        {
            Temperature      = 0.2,
            MaxTokens        = 500,
            ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
        };

        return _chat
            .GetStreamingChatMessageContentsAsync(history, settings, _kernel, ct)
            .Select(chunk => chunk.Content ?? string.Empty)
            .Where(content => !string.IsNullOrEmpty(content));
    }
}

ASP.NET Core SSE Endpoint

// Server-Sent Events (SSE): text/event-stream — native browser support
// Client-side: EventSource API or fetch with ReadableStream

app.MapPost("/api/copilot/stream", async (
    CopilotQuestion request,
    PrescriptionCopilotService copilot,
    HttpResponse response,
    CancellationToken ct) =>
{
    response.Headers.ContentType = "text/event-stream";
    response.Headers.CacheControl = "no-cache";
    response.Headers.Connection = "keep-alive";

    var history = new ChatHistory(SystemPrompt);

    await foreach (var chunk in copilot.StreamResponseAsync(request.Question, history, ct))
    {
        // SSE format: "data: <content>\n\n"
        await response.WriteAsync($"data: {JsonSerializer.Serialize(chunk)}\n\n", ct);
        await response.Body.FlushAsync(ct);  // important: flush each chunk immediately
    }

    // Signal end of stream
    await response.WriteAsync("data: [DONE]\n\n", ct);
    await response.Body.FlushAsync(ct);
});

public sealed record CopilotQuestion(string Question);

Consuming SSE in React/TypeScript

TYPESCRIPT

// Client-side: consume the SSE stream with fetch + ReadableStream
const streamCopilotResponse = async (
  question: string,
  onChunk: (chunk: string) => void,
  onDone: () => void
) => {
  const response = await fetch('/api/copilot/stream', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ question })
  });

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value);
    const lines = text.split('\n\n').filter(l => l.startsWith('data: '));

    for (const line of lines) {
      const data = line.slice(6); // remove "data: " prefix
      if (data === '[DONE]') { onDone(); return; }
      onChunk(JSON.parse(data));  // append to displayed text
    }
  }
};

// React component:
const [response, setResponse] = useState('');
const [loading, setLoading] = useState(false);

const ask = async (question: string) => {
  setLoading(true);
  setResponse('');

  await streamCopilotResponse(
    question,
    (chunk) => setResponse(prev => prev + chunk),  // append each chunk
    () => setLoading(false)
  );
};

Minimal API with IResult Streaming

// Alternative: use Results.Stream or write directly to response
// For simpler cases without SSE headers

app.MapPost("/api/copilot/stream-simple", (
    CopilotQuestion request,
    PrescriptionCopilotService copilot,
    CancellationToken ct) =>
{
    // Return an IResult that streams to the response
    return Results.Stream(
        async stream =>
        {
            var writer = new StreamWriter(stream, leaveOpen: true);
            await foreach (var chunk in copilot.StreamResponseAsync(request.Question, new ChatHistory(SystemPrompt), ct))
            {
                await writer.WriteAsync(chunk);
                await writer.FlushAsync();
            }
        },
        contentType: "text/plain; charset=utf-8");
});

Handling Streaming with Function Calls

// When function calling is enabled, Semantic Kernel executes functions
// during streaming — the stream pauses while functions run

// To handle this gracefully, track when function calls happen:
public async IAsyncEnumerable<string> StreamWithStatusAsync(
    string question,
    ChatHistory history,
    [EnumeratorCancellation] CancellationToken ct)
{
    history.AddUserMessage(question);
    var settings = new OpenAIPromptExecutionSettings
    {
        ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions,
        Temperature      = 0.2
    };

    var lastFunctionCalled = string.Empty;

    await foreach (var chunk in _chat.GetStreamingChatMessageContentsAsync(
        history, settings, _kernel, ct))
    {
        // Emit status when a function is being called
        if (chunk.InnerContent is StreamingChatCompletionUpdate update
            && update.ToolCallUpdates?.Any() == true)
        {
            var functionName = update.ToolCallUpdates[0].FunctionName;
            if (!string.IsNullOrEmpty(functionName) && functionName != lastFunctionCalled)
            {
                lastFunctionCalled = functionName;
                yield return $"\n[Checking {FormatFunctionName(functionName)}...]\n";
            }
        }

        if (!string.IsNullOrEmpty(chunk.Content))
            yield return chunk.Content;
    }
}

private static string FormatFunctionName(string name) =>
    name.Replace("_", " ").Replace("get ", "").Trim();
// "get_latest_inr" → "latest inr"

Production issue I've seen: A team implemented streaming but didn't call FlushAsync() after each chunk. The response buffer accumulated all chunks before sending — the client received the full response at the end, not progressively. The SSE endpoint looked correct in code but behaved identically to non-streaming from the user's perspective. The fix was adding await response.Body.FlushAsync(ct) after each WriteAsync. This is the most common streaming implementation mistake in ASP.NET Core — buffering middleware (Response compression, anti-forgery) can also swallow the flush; disable response buffering for SSE endpoints.

Key Takeaway

Streaming AI responses dramatically improves perceived latency — the user sees output within milliseconds instead of waiting 5-10 seconds for a complete response. Use GetStreamingChatMessageContentsAsync in Semantic Kernel and pipe chunks to the client via SSE (text/event-stream). Always call FlushAsync() after each chunk — without it, buffering middleware accumulates all chunks before sending. When function calling is enabled, streams pause during function execution — emit status messages to keep the UI responsive. SSE is simple and has native browser support via EventSource or fetch with ReadableStream.

Streaming AI Responses in .NET — SSE and Real-Time Output

Why Streaming Matters

Streaming with Semantic Kernel

ASP.NET Core SSE Endpoint

Consuming SSE in React/TypeScript

Minimal API with IResult Streaming

Handling Streaming with Function Calls

Key Takeaway

Enjoyed this article?

Leave a comment