Live Coding Interview Prep · Lesson 3 of 16
Sliding Window for Token and Text Problems
Why Sliding Window Matters for AI Engineers
The sliding window pattern is central to how LLMs and RAG pipelines handle text. Chunking a document into overlapping segments is a sliding window. Finding the longest passage where all sentences are relevant is a variable-size window. Processing token streams without buffering the full input is a sliding window.
If you can implement a sliding window cleanly under interview pressure, you demonstrate both algorithmic fluency and domain understanding.
The Pattern in One Mental Model
A sliding window maintains a subset of a sequence — a "window" — and moves it forward. The key insight: instead of recomputing from scratch each time, you maintain state by:
- Adding the new element entering from the right
- Removing the element leaving from the left
- Checking or updating your answer with the current window state
Array: [a, b, c, d, e, f, g]
Window size 3:
Step 0: [a, b, c] → check/update answer
Step 1: [b, c, d] → remove a, add d
Step 2: [c, d, e] → remove b, add e
...Fixed-Size Sliding Window
Classic Problem: Maximum Sum Subarray of Size k
def max_sum_subarray_k(nums: list[int], k: int) -> int:
"""
Find the maximum sum of any k consecutive elements.
Time: O(n) — one pass through the array
Space: O(1) — only track current and max sum
Interview signal: immediately see this as sliding window,
not nested loops (O(n*k)).
"""
if not nums or k <= 0 or k > len(nums):
return 0
# Initialize first window
window_sum = sum(nums[:k])
max_sum = window_sum
# Slide window one step at a time
for i in range(k, len(nums)):
# Add incoming element, remove outgoing element
window_sum += nums[i] - nums[i - k]
max_sum = max(max_sum, window_sum)
return max_sum
assert max_sum_subarray_k([1, 4, 2, 9, 7, 3, 8], 3) == 19 # [7, 3, 8] = 19? No: max is [2,9,7]=18 or [9,7,3]=19
assert max_sum_subarray_k([2, 1, 5, 1, 3, 2], 3) == 9 # [5, 1, 3] — wait: [2,1,5]=8, [1,5,1]=7, [5,1,3]=9 ✓
assert max_sum_subarray_k([], 3) == 0
assert max_sum_subarray_k([1, 2], 5) == 0 # k > lenAI Application: Chunking Text with Overlap
In RAG systems, documents are split into chunks before embedding. A naive split at fixed character/token boundaries loses context — a chunk might start mid-sentence. Overlapping windows preserve context at boundaries.
def chunk_text_fixed_size(
text: str,
chunk_size: int = 512,
overlap: int = 64,
) -> list[str]:
"""
Split text into overlapping chunks of chunk_size characters.
The overlap ensures that information at chunk boundaries
appears in two adjacent chunks — important for RAG because
a relevant sentence might span a boundary.
Time: O(n) where n = len(text)
Space: O(n) for the output chunks
Parameters:
chunk_size: size of each chunk in characters
overlap: number of characters repeated between consecutive chunks
"""
if not text:
return []
if len(text) <= chunk_size:
return [text]
chunks = []
start = 0
stride = chunk_size - overlap # how far to advance each step
if stride <= 0:
raise ValueError(f"overlap ({overlap}) must be less than chunk_size ({chunk_size})")
while start < len(text):
end = start + chunk_size
chunk = text[start:end]
chunks.append(chunk)
# If we've reached the end, stop
if end >= len(text):
break
start += stride # slide forward by stride = chunk_size - overlap
return chunks
def chunk_tokens_fixed_size(
tokens: list[str],
chunk_size: int = 100,
overlap: int = 20,
) -> list[list[str]]:
"""
Token-level chunking — more precise than character-level
since LLM context limits are measured in tokens.
Time: O(n)
Space: O(n * overlap / stride) — overlap causes duplication
"""
if not tokens:
return []
if len(tokens) <= chunk_size:
return [tokens[:]] # return a copy
chunks = []
stride = chunk_size - overlap
if stride <= 0:
raise ValueError(f"overlap ({overlap}) must be less than chunk_size ({chunk_size})")
start = 0
while start < len(tokens):
end = min(start + chunk_size, len(tokens))
chunks.append(tokens[start:end])
if end == len(tokens):
break
start += stride
return chunks
# Test chunking
sample_text = "The quick brown fox jumps over the lazy dog. " * 20
chunks = chunk_text_fixed_size(sample_text, chunk_size=100, overlap=20)
print(f"Input length: {len(sample_text)} chars")
print(f"Number of chunks: {len(chunks)}")
print(f"Chunk 0 ends with: ...{repr(chunks[0][-30:])}")
print(f"Chunk 1 starts with: {repr(chunks[1][:30])}...")
# The overlap should be visible: end of chunk 0 == start of chunk 1
assert chunks[0][-20:] == chunks[1][:20], "Overlap verification failed"
print("Overlap verified: last 20 chars of chunk 0 == first 20 chars of chunk 1")Variable-Size Sliding Window
Variable-size windows expand and shrink based on a condition. The pattern uses two pointers: left and right.
def longest_substring_without_repeating(s: str) -> int:
"""
Classic variable-size window: longest substring with all unique chars.
Expand right to add chars, shrink left when we find a duplicate.
Time: O(n) — each character enters and leaves the window once
Space: O(min(n, alphabet_size))
"""
seen: dict[str, int] = {} # char → last seen index
left = 0
max_length = 0
for right, char in enumerate(s):
# If char was seen and is inside our current window
if char in seen and seen[char] >= left:
# Move left past the previous occurrence
left = seen[char] + 1
seen[char] = right
max_length = max(max_length, right - left + 1)
return max_length
assert longest_substring_without_repeating("abcabcbb") == 3 # "abc"
assert longest_substring_without_repeating("bbbbb") == 1 # "b"
assert longest_substring_without_repeating("pwwkew") == 3 # "wke"
assert longest_substring_without_repeating("") == 0AI Application: Context Window Management
When a user has a multi-turn conversation with an LLM, you must manage the context window. If the history is too long, you need to trim it while preserving as much recent context as possible.
from dataclasses import dataclass, field
@dataclass
class Message:
role: str # "user" | "assistant" | "system"
content: str
token_count: int = 0
def __post_init__(self):
if self.token_count == 0:
# Rough approximation: 1 token ≈ 4 characters
self.token_count = max(1, len(self.content) // 4)
def trim_conversation_to_context_limit(
messages: list[Message],
max_tokens: int = 4096,
system_message: Message | None = None,
) -> list[Message]:
"""
Given a conversation history, return the largest suffix of messages
that fits within max_tokens. Always preserves the system message.
This is a variable-size sliding window from the right:
- Start with all messages
- Remove oldest messages (from the left) until total tokens fit
Time: O(n)
Space: O(n)
"""
reserved_tokens = system_message.token_count if system_message else 0
available_tokens = max_tokens - reserved_tokens
if available_tokens <= 0:
return []
# Start from the right (most recent) and work backward
total_tokens = 0
cutoff_index = len(messages)
for i in range(len(messages) - 1, -1, -1):
if total_tokens + messages[i].token_count > available_tokens:
cutoff_index = i + 1
break
total_tokens += messages[i].token_count
trimmed = messages[cutoff_index:]
if system_message:
return [system_message] + trimmed
return trimmed
# Test
history = [
Message("user", "Tell me about transformers.", 7),
Message("assistant", "Transformers are neural network architectures...", 100),
Message("user", "What is attention?", 5),
Message("assistant", "Attention allows the model to focus on relevant parts...", 150),
Message("user", "How does self-attention work?", 7),
Message("assistant", "In self-attention, each token attends to all other tokens...", 200),
Message("user", "What is the complexity?", 6),
]
system_msg = Message("system", "You are a helpful AI tutor.", 8)
trimmed = trim_conversation_to_context_limit(history, max_tokens=300, system_message=system_msg)
print(f"Original messages: {len(history)}")
print(f"Trimmed to: {len(trimmed)} messages (incl system)")
for msg in trimmed:
print(f" [{msg.role}] ({msg.token_count} tokens): {msg.content[:50]}...")Implementing a Sliding Window Chunker with Overlap
This is a complete implementation that handles edge cases — exactly the kind interviewers expect.
from typing import Iterator
def sliding_window_chunker(
text: str,
chunk_size: int,
overlap: int,
strip_whitespace: bool = True,
) -> Iterator[dict]:
"""
Production-quality sliding window text chunker.
Yields dicts with:
- text: the chunk content
- start: character offset in original text
- end: character offset in original text
- chunk_index: sequential chunk number
Edge cases handled:
- Empty text → yields nothing
- Text shorter than chunk_size → yields the full text as one chunk
- overlap >= chunk_size → raises ValueError
- Last chunk may be shorter than chunk_size
Time: O(n)
Space: O(chunk_size) — generator, doesn't hold all chunks in memory
"""
if overlap >= chunk_size:
raise ValueError(
f"overlap ({overlap}) must be strictly less than chunk_size ({chunk_size})"
)
if not text:
return
if strip_whitespace:
text = text.strip()
n = len(text)
stride = chunk_size - overlap
chunk_index = 0
start = 0
while start < n:
end = min(start + chunk_size, n)
chunk_text = text[start:end]
if strip_whitespace:
chunk_text = chunk_text.strip()
if chunk_text: # skip empty chunks after stripping
yield {
"text": chunk_text,
"start": start,
"end": end,
"chunk_index": chunk_index,
"is_last": end >= n,
}
chunk_index += 1
if end >= n:
break
start += stride
# Test cases
print("=== Test 1: Normal chunking ===")
text = "ABCDEFGHIJ"
for chunk in sliding_window_chunker(text, chunk_size=4, overlap=1):
print(chunk)
# Expected: ABC D, BCD E, CDE F... no wait: stride = 4-1 = 3
# start=0: ABCD, start=3: DEFG, start=6: GHIJ
print("\n=== Test 2: Empty text ===")
chunks = list(sliding_window_chunker("", chunk_size=10, overlap=2))
assert chunks == [], f"Expected [], got {chunks}"
print("Empty text test passed")
print("\n=== Test 3: Text shorter than chunk_size ===")
chunks = list(sliding_window_chunker("short", chunk_size=100, overlap=10))
assert len(chunks) == 1
assert chunks[0]["text"] == "short"
print("Short text test passed")
print("\n=== Test 4: Overlap verification ===")
text = "ABCDEFGHIJKLMNOP"
chunk_size, overlap = 6, 2
chunks = list(sliding_window_chunker(text, chunk_size=chunk_size, overlap=overlap))
for i in range(len(chunks) - 1):
current_end = chunks[i]["text"][-overlap:]
next_start = chunks[i+1]["text"][:overlap]
# Note: overlap at text level, not always exact due to stripping
print(f"Chunk {i} end: {repr(current_end)}, Chunk {i+1} start: {repr(next_start)}")
print("\n=== Test 5: ValueError on bad input ===")
try:
list(sliding_window_chunker("test", chunk_size=5, overlap=5))
assert False, "Should have raised ValueError"
except ValueError as e:
print(f"Correctly raised ValueError: {e}")Sentence-Aware Chunking
A smarter chunker respects sentence boundaries. It's still a sliding window, but the window unit is sentences rather than characters.
import re
from typing import Generator
def sentence_tokenize(text: str) -> list[str]:
"""
Simple sentence splitter using regex.
For production use nltk.sent_tokenize() or spacy.
"""
# Split on period/exclamation/question mark followed by space and capital letter
pattern = r'(?<=[.!?])\s+(?=[A-Z])'
sentences = re.split(pattern, text.strip())
return [s.strip() for s in sentences if s.strip()]
def sentence_aware_chunker(
text: str,
max_tokens_per_chunk: int = 200,
overlap_sentences: int = 1,
tokens_per_word_estimate: float = 1.3,
) -> list[dict]:
"""
Chunk text by sentences so no sentence is split mid-way.
The window slides over sentences, not characters.
Overlap is measured in sentences, not characters.
Time: O(n) — linear in number of sentences
Space: O(n) for output
"""
sentences = sentence_tokenize(text)
if not sentences:
return []
# Estimate token count per sentence (rough: words * 1.3)
def estimate_tokens(sentence: str) -> int:
return max(1, int(len(sentence.split()) * tokens_per_word_estimate))
token_counts = [estimate_tokens(s) for s in sentences]
chunks = []
chunk_index = 0
i = 0
while i < len(sentences):
# Build a chunk starting at sentence i
chunk_sentences = []
chunk_tokens = 0
j = i
while j < len(sentences):
if chunk_tokens + token_counts[j] > max_tokens_per_chunk and chunk_sentences:
# This sentence would exceed the limit — stop here
break
chunk_sentences.append(sentences[j])
chunk_tokens += token_counts[j]
j += 1
if not chunk_sentences:
# Single sentence exceeds limit — include it anyway to avoid infinite loop
chunk_sentences = [sentences[i]]
j = i + 1
chunks.append({
"text": " ".join(chunk_sentences),
"sentence_start": i,
"sentence_end": j,
"chunk_index": chunk_index,
"estimated_tokens": chunk_tokens,
})
chunk_index += 1
# Advance, keeping overlap_sentences from the end of this chunk
advance = max(1, (j - i) - overlap_sentences)
i += advance
return chunks
# Test
doc = (
"The transformer architecture was introduced in 2017. "
"It uses self-attention mechanisms to process sequences. "
"Unlike RNNs, transformers process tokens in parallel. "
"This makes them much faster to train on modern hardware. "
"The key innovation is the attention mechanism. "
"Attention allows each token to directly interact with any other token. "
"This removes the bottleneck of sequential processing. "
"BERT and GPT are both based on the transformer architecture. "
"BERT uses bidirectional attention while GPT uses causal attention. "
"Modern LLMs like GPT-4 are scaled-up transformers."
)
chunks = sentence_aware_chunker(doc, max_tokens_per_chunk=30, overlap_sentences=1)
print(f"Document sentences: {len(sentence_tokenize(doc))}")
print(f"Chunks produced: {len(chunks)}")
for c in chunks:
print(f"\nChunk {c['chunk_index']} (sentences {c['sentence_start']}-{c['sentence_end']}, ~{c['estimated_tokens']} tokens):")
print(f" {c['text'][:100]}...")Interview Talking Points
When you see a chunking or windowing problem, say this before coding:
"This looks like a sliding window. I'll identify whether it's fixed-size or variable-size. Fixed-size windows move by a constant stride; variable-size windows use two pointers and expand/shrink based on a condition. The key invariant I'll maintain is [state in the window], and I'll update it in O(1) per step to keep the overall algorithm O(n)."
Then ask: "Should I handle overlapping chunks? What should happen at the last chunk if the text doesn't divide evenly?" These questions show you've thought about the problem, not just the happy path.