AI Systems
End-to-end AI engineering — from LLM integration and RAG pipelines to prompt engineering and production AI systems.
Beginner
Leaky ReLU and ReLU Variants
Why Leaky ReLU, ELU, PReLU, and GELU were invented, what problem each solves, and when to use them over plain ReLU.
ReLU Activation
Why ReLU became the default hidden-layer activation — its gradient, the dead neuron problem, and variants like ELU and GELU.
Sigmoid Activation
The sigmoid function — its formula, gradient, saturation problem, and when to use it (output layer for binary classification) vs when to avoid it (hidden layers).
Early Stopping
Stopping training when validation loss stops improving — the most universally applicable regularisation technique and its implementation.
The Forward Pass
What happens when data flows through a neural network — layer by layer computation, tensor shapes at each step, and how PyTorch's autograd tracks the graph.
MLP Architecture
Multi-layer perceptrons from scratch — hidden layers, activation functions, parameter counting, and building MLPs for clinical tabular data.
Tensors Explained
What tensors are, how they generalise scalars, vectors, and matrices, tensor shapes in deep learning, and common PyTorch tensor operations.
Bayes' Theorem
Bayes' theorem from first principles — what it says, why it matters, how to apply it to medical diagnosis, and its role in machine learning.
Convolutional Neural Networks: Introduction
What CNNs are, why convolution works for images, the key components, and how they compare to fully connected networks.
Compute Requirements for Deep Learning
What hardware deep learning needs, why GPUs matter, memory calculations, training time estimates, and cost-effective approaches.
Feature Engineering vs Deep Learning
How manual feature engineering differs from deep learning's automatic feature learning, when each approach is better, and how they can be combined.
Neural Network Layers Explained
What layers are in a neural network, how information flows through them, the role of each layer type, and how to build a simple network in PyTorch.
Matrix Operations in Deep Learning
The core matrix operations that power neural networks — matrix multiplication, broadcasting, batch operations, and how they map to PyTorch.
Anatomy of a Neuron
The mathematical structure of an artificial neuron — inputs, weights, bias, the dot product, and the activation function — with implementation.
Overfitting in Neural Networks
What overfitting is, how to detect it, why deep networks are prone to it, and the primary techniques to prevent it.
PyTorch vs TensorFlow
The practical differences between PyTorch and TensorFlow — syntax, ecosystem, debugging, deployment, and which to choose for different use cases.
Deep Learning vs Machine Learning
The distinction between machine learning and deep learning, what makes deep learning different, and when each approach is appropriate.
When to Use Deep Learning
A practical decision framework for choosing deep learning vs simpler approaches — data requirements, problem types, and the cost of complexity.
Bernoulli, Binomial, and Poisson Distributions
The three core discrete distributions — what they model, their parameters, when to use each, and their roles in machine learning.
Probability Distributions
What probability distributions are, the difference between discrete and continuous distributions, and the key properties that define them.
Probability Fundamentals
The axioms of probability, sample spaces, events, and the rules that govern all probabilistic reasoning in statistics and machine learning.
Independence and Dependence
What it means for events to be independent or dependent, how to test for independence, and why independence assumptions matter in ML models.
Explaining Probability in Interviews
How to explain joint, marginal, and conditional probability to a non-technical interviewer — and common probability interview questions with clear answers.
Joint, Marginal, and Conditional Probability
The three types of probability and how they relate — joint P(A,B), marginal P(A), and conditional P(A|B) — with medical examples and ML applications.
The Normal Distribution
The bell curve in depth — its parameters, the 68-95-99.7 rule, the Central Limit Theorem, z-scores, and why the normal distribution is everywhere in ML.
Probability in Action: Spam Filter
A complete worked example applying joint, conditional, and Bayesian probability to build a spam classifier — showing all the calculations step by step.
Law of Total Probability
The law of total probability, how to use it to decompose complex probability computations, and its connection to Bayes' theorem and ML.
Chunk Overlap and Boundary Handling
Why chunk overlap exists, how much to use, how it affects storage and retrieval, and strategies for handling boundaries in clinical RAG.
Cosine vs Dot Product Similarity
When to use cosine similarity versus dot product in vector search, how they differ mathematically, and which embedding models require which metric.
Fixed-Size Chunking
How fixed-size chunking works, its parameters, trade-offs, and when it is the right default strategy for RAG document ingestion.
Recursive Chunking
How recursive character text splitting respects document structure by cascading through separators, and when it outperforms fixed-size chunking.
Similarity Search in Vector Databases
How vector similarity search works, the difference between exact and approximate search, and how to implement retrieval with filtering in Chroma and FAISS.
Correlation vs Causation
Why correlation does not imply causation, the types of relationships that produce spurious correlations, and how to think about causality in ML systems.
Correlation: Measuring Relationships
What correlation measures, the difference between Pearson, Spearman, and Kendall correlation, how to interpret correlation coefficients, and applications in ML.
Statistics Inside AI Models
Where descriptive statistics appear inside neural networks and training pipelines — from batch normalisation to loss surfaces to gradient statistics.
IQR and Outlier Detection
How the interquartile range identifies outliers using Tukey's fences, why it's robust to extreme values, and how to apply it to ML feature engineering.
Mean, Median, and Mode
The three measures of central tendency — what they are, how to compute them, when each is most appropriate, and how they appear in ML.
Population vs Sample Statistics
The difference between population and sample, why it matters for formulas, and how sampling appears throughout machine learning.
Range and Dispersion
Measures of spread beyond standard deviation — range, IQR, mean absolute deviation, and coefficient of variation — and when each is appropriate in ML contexts.
Spurious Correlations and When to Worry
What makes a correlation spurious, how to spot them in ML datasets, and why they're particularly dangerous in healthcare AI.
Standard Deviation and Variance
What variance and standard deviation measure, how to compute them, population vs sample formulas, and their role in machine learning.
Standard Deviation in Plain English
An intuitive, non-mathematical explanation of standard deviation — what it means, how to interpret values, and why it matters in practice.
Getting Started with OpenAI Codex for .NET Developers
Use OpenAI Codex as an agentic coding assistant — CLI setup, workflows, prompting, safety checks, and how it fits alongside Copilot and Claude Code on real .NET projects.
What Is a Large Language Model?
What LLMs are, how they work at a high level, what 'large' means, and how they differ from earlier NLP approaches.
Anatomy of a Prompt
The structural components of a production prompt — system message, role, task, context, constraints, format, and examples — and how each shapes model behaviour.
Role-Playing and Persona Prompting
How assigning a role or persona shapes LLM behaviour, why it works, when it helps most, and the safety limits of role-based prompting.
What Is Prompt Engineering?
What prompt engineering is, why it matters for production AI systems, and the core mental model for thinking about prompts as a communication interface to LLMs.
Why Prompts Matter in Production
Why prompt quality has outsized impact on LLM output quality, reliability, cost, and safety — and why prompt engineering is a core engineering discipline for AI systems.
Zero-Shot vs Few-Shot Prompting
How zero-shot and few-shot prompting differ, when each works, how to write effective few-shot examples, and the impact of example order and selection.
Embeddings for RAG
What text embeddings are, how they enable semantic search, how to choose an embedding model for RAG, and the key dimensions of performance.
RAG Pipeline Overview
The full RAG pipeline from document ingestion to answer generation — indexing, retrieval, augmentation, and generation phases with implementation examples.
RAG vs Fine-Tuning
When to choose RAG over fine-tuning and vice versa — the decision framework based on knowledge type, update frequency, cost, and latency requirements.
What Is RAG?
What Retrieval-Augmented Generation is, why it exists, and how it solves the hallucination and knowledge cutoff problems of standalone LLMs.
Python Setup for AI Engineering
Set up a professional Python environment for AI projects: pyenv, venv, pip, requirements.txt, .env files, python-dotenv, and a clean project structure.
Python Types for AI Code
Master Python type hints: str, int, float, list, dict, Optional, Union, Any, TypedDict, Literal, and dataclasses — and understand why types are essential for AI engineering.
AI-Assisted Development: Copilot, Prompt Engineering, and AI Workflows
Use AI tools productively as an engineer — GitHub Copilot patterns, effective prompting for code generation, AI-assisted debugging and refactoring, and the workflows that make AI a genuine force multiplier.
AI Engineering Roadmap (2026): A Practical Path from LLM Basics to Production Systems
Follow this practical AI engineering roadmap with a structured learning path across RAG, agent workflows, multimodal apps, security, and production evaluation.
AI for Developers Course Orientation: Roadmap, Prerequisites, and Study Plan
Start the AI for Developers course with a complete orientation: prerequisites, chapter-by-chapter outcomes, tools, weekly pacing, and project expectations.
AI/ML/NLP Research Track Orientation: Full Beginner-to-Advanced Guide
Complete orientation for the AI/ML/NLP Research Track: prerequisites, module map, project sequence, pacing, research workflow, and portfolio expectations.
How to Read AI Papers (Beginner Guide): Practical Method
Learn a practical way to read AI and NLP papers without getting lost in heavy math: what to read first, how to extract value, and how to reproduce experiments.
Jupyter Notebook Detailed Tutorial for Data Science and AI Workflows
Learn Jupyter Notebook in depth: setup, cells, kernels, markdown, debugging, reproducibility, notebook structure, and production best practices.
Kaggle Python Course: Practical and Fast Track for AI Beginners
A practical Kaggle-first Python learning path focused on fast execution: notebooks, Pandas workflows, mini tasks, and portfolio-ready outputs.
Machine Learning Basics (3-4 Weeks): Supervised Learning and 3 Core Projects
Learn ML fundamentals quickly with scikit-learn: classification, regression, train-test split, overfitting, metrics, and three must-build beginner projects.
Matplotlib Detailed Tutorial: From Basic Plots to Professional Visualizations
Learn Matplotlib with practical examples: line, bar, scatter, histogram, subplots, styling, annotations, and publishing-quality chart design.
Project: Movie Sentiment Analysis with scikit-learn
Build a practical sentiment analysis classifier with scikit-learn: data cleaning, TF-IDF vectorization, multiple model comparison, evaluation, error analysis, and interpretation of model behaviour on noisy real-world reviews.
NumPy Detailed Tutorial: Beginner to Advanced with Real Examples
Master NumPy step by step: ndarrays, indexing, broadcasting, vectorization, linear algebra, random sampling, performance, and practical exercises.
Pandas Detailed Tutorial: Data Cleaning, Analysis, and Real Workflows
Learn Pandas from beginner to advanced with DataFrame fundamentals, cleaning, joins, groupby, time series, and practical end-to-end analysis workflows.
Prompt Engineering Course Orientation: Detailed Learning Path
A detailed orientation for the Prompt Engineering course with learning outcomes, module sequence, exercises, evaluation criteria, and project guidance.
Project: Spam Detection with scikit-learn (Step-by-Step)
Build a complete spam detection classifier with scikit-learn: data loading, preprocessing, TF-IDF vectorization, model training, evaluation metrics, error analysis, threshold tuning, and deployment considerations.
Project 3: Tabular Prediction Model with scikit-learn
Build a production-style tabular ML pipeline with preprocessing, train/validation strategy, model training, metrics, and feature importance.
AI Agents — What They Are and How Semantic Kernel Implements Them
Understand what AI agents are, how they differ from chatbots, and how to build agents with Semantic Kernel: plugins, planners, memory, and multi-agent orchestration.
OpenAI SDK in .NET — Chat, Streaming, and Function Calling
Use the official OpenAI .NET SDK to build AI features: chat completions, streaming responses, structured outputs with function calling, and token management.
Ollama: Run Powerful AI Models Locally — No API Keys, No Cost
The complete developer guide to Ollama — install and run Llama 3, Mistral, Gemma, and Phi-4 locally, build .NET and Python apps against local models, and understand when local AI beats cloud AI.
RAG — Retrieval-Augmented Generation Architecture
Understand how RAG works: chunk documents, generate embeddings, store in a vector database, retrieve relevant context, and augment LLM prompts to ground answers in your own data.
How AI & LLMs Actually Work: A Developer's Guide
Understand what's really happening inside ChatGPT and LLMs — tokens, embeddings, attention, the transformer architecture, and how to use the OpenAI API in .NET and Python with real code examples.
Prompt Engineering: Zero to Hero
Master every prompt engineering technique — zero-shot, few-shot, chain-of-thought, ReAct, structured output, system prompts, and building reliable AI pipelines. With real OpenAI API examples.
Intermediate
Activation Functions — Interview Q&A
Five key interview questions on sigmoid, ReLU, softmax, dead neurons, and choosing activations for different architectures.
Softmax Activation
How softmax converts logits to a probability distribution over classes, its gradient, numerical stability, and when to use temperature scaling.
Adam Optimizer
How Adam combines momentum and adaptive learning rates — the math behind m_t, v_t, bias correction, and when to use Adam vs SGD.
Backpropagation
How backprop computes gradients layer by layer using the chain rule — the algorithm that makes deep learning trainable.
Batch Normalisation
How BatchNorm normalises activations mid-network, its learnable parameters, the difference between train and eval modes, and Layer Norm for transformers.
The Chain Rule in Deep Learning
How the calculus chain rule enables backpropagation — derivative composition, the Jacobian, and why this makes neural network training feasible.
CNN Architecture
How modern CNNs are structured — input normalisation, conv blocks, downsampling strategies, global pooling, and the classification head.
CNN — Interview Q&A
Six key interview questions on CNN architecture, filters, pooling, skip connections, transfer learning, and medical imaging applications.
CNN Kernels and Feature Maps
What learned kernels detect, depthwise separable convolutions, dilated convolutions, and visualising what a CNN has learned.
CNN Object Detection
From image classification to detection — anchors, YOLO, Faster R-CNN, non-maximum suppression, and applying detection to medical imaging.
CNNs in Real-World Clinical AI
Deploying CNN-based medical image models — DICOM preprocessing, clinical validation, bias detection, explainability with Grad-CAM, and regulatory considerations.
Transfer Learning with CNNs
Fine-tuning pre-trained ImageNet models for medical imaging — freezing strategies, learning rate schedules, and when to use transfer learning vs training from scratch.
Data Augmentation
Creating training variety through transforms, time-series augmentation for clinical signals, Mixup, CutMix, and test-time augmentation for better inference.
DataLoader and Data Pipeline
Building efficient PyTorch data pipelines — Dataset, DataLoader, transforms, and handling clinical data with proper train/val/test splits.
Depth vs Width in Neural Networks
Why deeper networks learn hierarchical features, why wider networks have more capacity, and how to choose architecture dimensions for your task.
Deep Learning Interview Strategy
How to approach deep learning interviews — structuring answers, handling unknown questions, key frameworks for system design, and common traps to avoid.
Exploding Gradients
Why gradients can grow exponentially in deep networks, how to detect explosion, and gradient clipping as the standard fix.
GPU Training in Practice
Setting up GPU training in PyTorch, multi-GPU strategies, monitoring GPU utilisation, and common pitfalls.
Gradient Descent
How gradient descent minimises a loss function by following the negative gradient — the core algorithm behind all neural network training.
Learning Rate
The most important hyperparameter — how to choose it, the learning rate range test, warmup strategies, and what happens when it's too high or too low.
Loss Functions
MSE, MAE, BCE, cross-entropy, focal loss — how each loss measures prediction error and which to use for regression, binary classification, and multi-class problems.
The Loss Landscape
Local minima, saddle points, flat regions, and sharp vs flat minima — visualising what gradient descent traverses and why it matters for generalisation.
Learning Rate Schedulers
Cosine annealing, step decay, warmup, OneCycleLR, and ReduceLROnPlateau — when and why to reduce the learning rate during training.
Network Capacity and Expressivity
What capacity means, how to measure it, signs of too little or too much capacity, and how to tune architecture size to your dataset.
Neural Networks — Interview Q&A
Six key interview questions on MLP architecture, capacity, the forward pass, loss functions, optimisers, and debugging training failures.
Optimisers — Interview Q&A
Six key interview questions on gradient descent, Adam, SGD, learning rate scheduling, and choosing optimisers for clinical AI systems.
Regularisation — Interview Q&A
Five key interview questions on Dropout, BatchNorm, L1/L2, early stopping, and choosing the right regularisation strategy for clinical AI.
L1 and L2 Regularisation
How L1 and L2 weight penalties prevent overfitting, their probabilistic interpretation as priors, and when to use each.
ResNet and VGG
The VGG design philosophy, why ResNet's skip connections solved the degradation problem, and how these architectures shaped modern deep learning.
RNNs and LSTMs
How recurrent networks process sequences, the LSTM gate mechanism that solved vanishing gradients, and when to use RNNs vs Transformers for clinical time series.
SGD vs Batch vs Mini-Batch
The three gradient descent variants — when to use each, their noise profiles, and why mini-batch SGD is the standard in deep learning.
Transformers Introduction
The architecture that changed AI — self-attention, multi-head attention, positional encoding, and why transformers replaced RNNs for sequence modelling.
Universal Approximation Theorem
What the universal approximation theorem says, what it doesn't say, and why it matters (and doesn't matter) for practical deep learning.
Vanishing Gradients
Why gradients shrink to zero in deep networks with sigmoid/tanh activations, how it blocks learning in early layers, and the fixes that made deep learning work.
Bayesian Thinking in AI
How Bayesian reasoning appears throughout AI and ML — from Naive Bayes to Bayesian neural networks, Gaussian processes, and uncertainty quantification.
Naive Bayes Classifier
A complete guide to Naive Bayes — the conditional independence assumption, variants (Gaussian, Multinomial, Bernoulli), when it works despite the assumption, and implementation.
Prior, Likelihood, and Posterior
The three components of Bayesian inference — what each term means, how to choose priors, and how the posterior combines prior belief with evidence.
CNN Filters and Pooling
How convolutional filters detect features, stride and padding, max vs average pooling, and the feature map hierarchy from edges to objects.
Dropout Regularisation
How dropout works, the inverted dropout implementation, MC dropout for uncertainty estimation, and when to use it.
Generalisation Techniques in Deep Learning
The full toolkit for improving deep learning generalisation — data augmentation, label smoothing, mixup, weight decay, early stopping, and cross-validation.
Deep Learning vs ML: Interview Q&A
Interview questions comparing deep learning and traditional ML — when to use each, how to justify the choice, and common gotchas.
Weight Initialisation
Why weight initialisation matters for training, the Xavier and Kaiming schemes, what happens with bad initialisation, and PyTorch defaults.
Chain Rule of Probability
The chain rule of probability — how joint probabilities factorise into conditionals, its connection to language models, and how to apply it.
Conditional Probability
Conditional probability in depth — the definition, computing it from tables, Bayes' theorem derivation, and applications in ML classifiers.
Probability Distributions in Machine Learning
How specific probability distributions appear inside ML models — loss functions, outputs, regularisation, and generative models.
BM25: Keyword Search for RAG
How BM25 works, why it complements semantic search, how to implement it, and when it outperforms dense retrieval in clinical RAG.
RAG Chunking Strategy — Interview Q&A
Senior-level interview questions and answers on RAG chunking strategies: chunk size, overlap, splitting methods, parent-document retrieval, clinical RAG design, and production tuning.
RAG Evaluation Metrics
The metrics used to evaluate RAG systems — retrieval quality (precision, recall, MRR, NDCG) and generation quality (faithfulness, answer relevance, context utilisation).
HNSW: The Vector Index Powering RAG
How Hierarchical Navigable Small World graphs enable fast approximate nearest neighbour search in vector databases, and the parameters that matter for RAG.
Maximal Marginal Relevance in RAG
How MMR balances relevance and diversity when selecting chunks, the lambda parameter, and when to use it instead of plain top-k retrieval.
RAG in Production — Senior Interview Q&A
Senior-level interview questions on production RAG systems: system design, reliability, latency optimisation, hallucination prevention, multi-turn conversations, monitoring, and clinical safety.
Query Expansion for RAG
Techniques for expanding queries before retrieval — synonym injection, LLM rewriting, HyDE, and multi-query — to improve recall when the user's phrasing differs from the knowledge base.
RAGAS: RAG Evaluation Framework
How to use RAGAS to evaluate RAG pipelines — the four core metrics, what each measures, how to run evaluations, and interpreting results.
Reciprocal Rank Fusion
How Reciprocal Rank Fusion combines results from multiple retrieval systems without requiring score normalisation, and how to implement it for hybrid RAG.
Semantic Chunking
How semantic chunking uses embedding similarity to find natural topic boundaries, when it outperforms structural chunking, and its computational cost.
Pearson Correlation Deep Dive
The mathematical derivation of Pearson correlation, its assumptions, when it fails, and how it connects to linear regression and cosine similarity.
Sampling in Machine Learning
How sampling strategies — random, stratified, systematic, and bootstrap — affect model training and evaluation, with practical implementation.
Sampling Interview Traps
Common interview gotchas about sampling — data leakage, temporal splits, test set contamination, and how to discuss them fluently.
Variance and Model Stability
How variance in model outputs, predictions, and training runs reveals instability — and techniques to reduce it.
RAG Ablation Studies
How to systematically test which RAG components matter most — ablation methodology, what to test, and how to interpret results to guide architectural decisions.
Contextual Compression
How contextual compression extracts only the relevant portions of retrieved documents before passing them to the LLM — reducing noise and saving context window space.
GraphRAG
How GraphRAG uses a knowledge graph of entities and relationships to enable multi-hop reasoning beyond what vector retrieval can handle — architecture, implementation, and when to use it.
Hybrid Retrieval
Combining dense (vector) and sparse (BM25) retrieval — why hybrid outperforms either alone, how to implement it, and fusion strategies.
HyDE: Hypothetical Document Embeddings
How HyDE improves RAG retrieval by embedding a hypothetical answer instead of the query — bridging the query-document embedding gap.
Advanced RAG Interview Q&A
Common senior interview questions on advanced RAG techniques — hybrid retrieval, reranking, query transformation, evaluation, and production design decisions.
Maximal Marginal Relevance (MMR)
How MMR balances relevance and diversity in RAG retrieval, the algorithm, when to use it, and implementation with embeddings.
Multi-Query Retrieval
How generating multiple query variants and merging their results improves RAG recall — the algorithm, implementation, and when it helps most.
Parent Document Retrieval
How parent document retrieval combines fine-grained chunk search with full-context retrieval — the algorithm, implementation, and when to use it over flat chunking.
Query Rewriting
How rewriting user queries before retrieval improves RAG recall — expanding abbreviations, correcting spelling, converting to keyword form, and step-back prompting.
RAGAS: RAG Evaluation Framework
How RAGAS evaluates RAG pipelines across faithfulness, answer relevancy, context precision, and context recall — with implementation and interpretation.
Reranking in RAG
How cross-encoder rerankers improve retrieval precision, the bi-encoder vs cross-encoder trade-off, and implementing reranking with Cohere and sentence-transformers.
Small-to-Big Retrieval
The small-to-big RAG pattern — searching with sentence-level precision but returning paragraph or section-level context — and how it compares to parent document retrieval.
Step-Back Prompting
How step-back prompting improves RAG by first retrieving high-level concept documents before the specific answer — the algorithm and clinical applications.
AI Agent Memory — Giving Agents Context Over Time
Implement memory for AI agents in .NET: in-session chat history, short-term conversation context, long-term memory with vector search, semantic memory with Semantic Kernel, and memory safety for clinical systems.
Multi-Agent Systems — Coordinating Specialised AI Agents
Build multi-agent systems in .NET: agent orchestration, specialised agents communicating via messages, handoff patterns, parallel agent execution, and safety boundaries for clinical multi-agent workflows.
AI Agent Planning — Decomposing Goals into Actions
Build planning AI agents in .NET: how agents decompose complex goals, sequential vs parallel planning, Semantic Kernel step-by-step plans, goal-directed reasoning, and safety constraints for clinical AI.
AI Agent Tools — Giving Agents Access to Systems
Build AI agent tools in .NET: defining tool functions, tool schemas, tool execution, error handling from tools, and designing safe tool sets for clinical AI agents.
Function Calling — Letting the AI Call Your .NET Code
Implement function calling with OpenAI and Semantic Kernel in .NET: defining tools, handling tool calls, multi-step function execution, type-safe parameter mapping, and safety considerations for clinical AI.
Ollama — Running Local LLMs in .NET Development
Use Ollama to run local large language models in .NET development: setup, integration with Semantic Kernel, model selection for clinical tasks, and when local models are appropriate vs. cloud APIs.
AI in Production — Reliability, Cost, and Safety
Deploy AI features in production .NET applications reliably: rate limiting, cost management, output validation, fallback strategies, prompt injection prevention, and observability for LLM calls.
Semantic Kernel — Building AI Features in .NET
Use Microsoft Semantic Kernel in .NET to build AI-powered features: kernel setup, plugins, prompt functions, native functions, chat history, and integrating LLMs into clinical .NET applications.
Streaming AI Responses in .NET — SSE and Real-Time Output
Stream AI responses to clients in ASP.NET Core: Server-Sent Events (SSE), streaming from OpenAI/Semantic Kernel, IAsyncEnumerable patterns, and building a real-time AI copilot UI.
AI Interview Important Basics — Your Fast 14-Day Plan
Structured 14-day GenAI interview prep: LLMs, prompts, agents, LangChain, Semantic Kernel, MCP, RAG, Azure AI Search, and what to explain in senior AI interviews.
Evaluating Agentic AI Systems
Why agent evaluation is hard, and how to do it anyway. Task completion rate, step efficiency, trajectory evaluation, and human review sampling with Python examples.
Agent Failure Modes
The five most common ways AI agents fail in production: infinite loops, hallucinated tool calls, context poisoning, goal drift, and output quality issues. Plus mitigations for each.
Stopping Conditions and Max Iterations
Every agent loop needs a way to stop. Learn four stopping mechanisms — hard stop, soft stop, budget stop, and timeout — with Python implementations.
Interview: Multi-Agent Pattern Questions
10 Q&A pairs covering multi-agent patterns for AI engineering interviews. Topics: supervisor vs peer, pipeline, when to use which, failure modes, and evaluation.
Peer-to-Peer Multi-Agent Pattern
Agents communicate directly without a central coordinator. Learn the debate and adversarial patterns where one agent proposes and another critiques, with Python examples.
Pipeline Multi-Agent Pattern
A linear chain where each agent processes the output of the previous one. Build typed, fault-tolerant pipelines with Pydantic interfaces between stages.
Aspire Components — Integrating SQL, Redis, and Messaging
Use .NET Aspire components for database, cache, and messaging integration: Aspire SQL Server, Redis, RabbitMQ, and Azure Service Bus components — health checks, connection resilience, and telemetry included.
Aspire Observability — Traces, Metrics, and Logs in One Dashboard
Use .NET Aspire's built-in observability: OpenTelemetry auto-instrumentation, the Aspire Dashboard for distributed traces and structured logs, custom metrics, and exporting telemetry to production backends.
Aspire Resilience — Polly Retry and Circuit Breaker Patterns
Add resilience to .NET Aspire services: Polly retry policies, circuit breakers, hedging, rate limiters, and how Aspire's AddServiceDefaults wires resilience for all HTTP clients automatically.
Aspire Service Discovery — Wiring Services in Local Development
Use .NET Aspire's service discovery to wire microservices and dependencies in local development: AppHost project, resource references, named endpoints, and how Aspire replaces manual connection string management.
Azure App Service — Deploying and Configuring ASP.NET Core
Deploy and configure ASP.NET Core applications on Azure App Service: deployment slots, app settings, connection strings, scaling, managed identity, and production-ready configuration patterns.
Azure Blob Storage — Storing and Retrieving Files in .NET
Use Azure Blob Storage in .NET: uploading patient documents, generating SAS tokens for secure access, streaming large files, lifecycle management policies, and Managed Identity authentication.
Azure Functions Deployment — CI/CD and Hosting Plans
Deploy Azure Functions in .NET: Consumption vs Premium vs Dedicated hosting plans, GitHub Actions CI/CD pipeline, deployment slots, environment configuration, and production-ready packaging.
Durable Functions — Stateful Workflows in Azure Functions
Implement stateful workflows with Azure Durable Functions: orchestration patterns (fan-out/fan-in, human approval, monitor), activity functions, durable entities, and clinical workflow examples.
Azure Functions Monitoring — Application Insights and Alerting
Monitor Azure Functions in production: Application Insights integration, custom metrics, structured logging, performance monitoring, live metrics, and alerting for failed executions and cold starts.
Azure Functions Triggers — HTTP, Timer, Service Bus, and Blob
Use Azure Functions triggers in .NET: HTTP triggers for APIs, Timer triggers for scheduled jobs, Service Bus triggers for event processing, and Blob triggers for file processing workflows.
Azure Key Vault — Secrets, Keys, and Certificates in .NET
Use Azure Key Vault in .NET applications: storing secrets, injecting Key Vault into IConfiguration, Managed Identity authentication, key rotation, and certificate management for clinical APIs.
Azure Service Bus — Reliable Messaging Between Services
Use Azure Service Bus in .NET: publishing and consuming messages, queues vs topics, dead-letter queues, message sessions, Managed Identity authentication, and reliable delivery patterns for clinical events.
Azure SQL Database — Production Configuration for .NET Applications
Configure and use Azure SQL Database in .NET applications: connection resilience, Managed Identity authentication, geo-replication, elastic pools, and performance monitoring with Query Performance Insight.
AAA Pattern — Arrange, Act, Assert for Clean, Readable Tests
How to apply the Arrange-Act-Assert pattern consistently in Clean Architecture .NET tests: structure, naming conventions, FluentAssertions, parameterized tests with Theory, and the common mistakes that make tests hard to maintain.
API Layer — Controllers, Minimal APIs, and Request/Response Mapping
How the API layer works in Clean Architecture: controllers as thin orchestrators, request/response DTOs, mapping to commands and queries, Problem Details for errors, and the production mistakes that happen when business logic leaks into controllers.
Application Layer — Use Cases, Interfaces, and Orchestration
What the Application layer is responsible for in Clean Architecture: orchestrating use cases via command and query handlers, defining interfaces for external services, and keeping business logic out.
Architecture Tests — Enforcing Layer Boundaries With NetArchTest
How to write architecture tests with NetArchTest in .NET: testing layer dependencies, naming conventions, encapsulation rules, and why these tests prevent the codebase from silently drifting from Clean Architecture principles.
.NET Aspire — Orchestrating Services, Databases, and Observability Locally
How .NET Aspire simplifies local development in a Clean Architecture project: AppHost orchestration, automatic connection string injection, the Aspire Dashboard, ServiceDefaults, and what changes between local and production.
Cache Strategies — Cache-Aside, Stampede Protection, and Invalidation
Practical caching strategies for .NET APIs: Cache-Aside pattern, write-through vs write-behind, stampede protection with HybridCache, cache invalidation approaches, and the production bugs that come from getting them wrong.
Manual CQRS — Commands, Queries, and Handlers Without MediatR
How to implement CQRS manually in Clean Architecture without MediatR: command and query records, typed handlers, DI-based dispatch, and why skipping the mediator keeps the code simpler and more navigable.
The Dependency Rule — Enforcing It With Architecture Tests
What the Dependency Rule means in practice, how to verify it with NetArchTest, the nine tests included in the Clean Architecture template, and why CI enforcement is the only reliable form.
Domain Events — Raising, Dispatching, and Handling Side Effects
Domain events in Clean Architecture: how to raise them from entities, collect them after persistence, dispatch with a simple publisher, and handle side effects like emails and audit logs without coupling the domain.
Domain Layer — Entities, Value Objects, and Zero External Dependencies
What the Domain layer contains, why it has zero NuGet dependencies, how to design entities with private setters, and the patterns that keep domain logic pure and testable.
EF Core Setup — DbContext, Configurations, and Migrations in Clean Architecture
How to set up EF Core correctly in Clean Architecture: DbContext with IUnitOfWork, IEntityTypeConfiguration per entity, strongly-typed ID converters, owned entities for value objects, and migrations in the right project.
Entities and Aggregate Roots — Design, Identity, and Invariants
How to design entities in Clean Architecture: strongly-typed IDs, private setters, factory methods, aggregate roots, invariant enforcement, and the patterns that make domain models trustworthy.
Error Handling — Problem Details, Global Exception Middleware, and Result Mapping
How to handle errors consistently in Clean Architecture: Problem Details RFC 7807, global exception middleware for unexpected failures, Result pattern for expected failures, and the production issues that come from inconsistent error responses.
FluentValidation — Validating Commands and Queries in the Application Layer
How to use FluentValidation in Clean Architecture: validators for commands, async rules, integration with the handler pipeline, error mapping to Result, and the production pitfalls of validating too late.
Microsoft HybridCache — L1 In-Memory Plus L2 Redis in One API
How HybridCache works in .NET 9+: the two-layer architecture, stampede protection, tag-based invalidation, and the production problems it solves compared to IMemoryCache and IDistributedCache separately.
ASP.NET Identity — Users, Roles, and Refresh Token Storage
How to configure ASP.NET Identity in Clean Architecture: custom AppUser, refresh token entity, Identity DbContext integration, role-based authorization, and the production pitfalls of token storage.
Infrastructure Layer — Persistence, External Services, and Dependency Injection
What the Infrastructure layer contains in Clean Architecture, how to implement repository interfaces, wire up DI, and the production mistakes that happen when infrastructure concerns leak into other layers.
Clean Architecture — Layers, the Dependency Rule, and Why It Matters
Clean Architecture fundamentals: the four layers, the Dependency Rule, what belongs where, and why the architecture makes large .NET codebases maintainable over years.
JWT Authentication — Access Tokens, Refresh Tokens, and Endpoint Security
How to implement JWT authentication in Clean Architecture: login/register endpoints, short-lived access tokens, refresh token rotation, revocation, role-based authorization, and the production security mistakes to avoid.
No Repository Pattern — Using EF Core as Your Abstraction
Why the Clean Architecture template skips the repository pattern, how EF Core's DbSet and IQueryable already provide a sufficient abstraction, and the production problems the extra layer introduces.
OpenTelemetry — Traces, Metrics, and Logs for Distributed Systems
How to configure OpenTelemetry in a Clean Architecture .NET project: distributed traces with Activity, metrics with Meter, auto-instrumentation for EF Core and HTTP clients, exporting to Jaeger or OTLP, and the observability gaps it fills.
Six Opinionated Choices — Why This Template Deviates From Defaults
The six deliberate architectural decisions in the Clean Architecture template: no MediatR, Scalar over Swagger, HybridCache over IDistributedCache, Result pattern over exceptions, no repository pattern, and .slnx format — with the reasoning behind each.
Project Structure — 8 Projects, the .slnx Format, and What Goes Where
The exact project layout of a Clean Architecture .NET solution: what each of the 8 projects contains, why the .slnx format replaces .sln, and the conventions that keep the solution navigable.
Redis Setup With .NET Aspire — Connection Strings, Health Checks, and Configuration
How to set up Redis as the L2 cache backing in a Clean Architecture .NET project: StackExchange.Redis configuration, .NET Aspire integration, health checks, connection resilience, and production configuration patterns.
Result Pattern — Returning Errors Without Exceptions
The Result pattern in Clean Architecture: why exceptions are wrong for business rule failures, how to implement Result and Result<T>, how to use Match for mapping, and the production bugs this pattern prevents.
Scalar API Docs — Replacing Swagger With a Modern Developer Experience
How to configure Scalar as the API documentation UI in a .NET Clean Architecture project, why it replaces Swagger/Swashbuckle, and how to annotate endpoints for meaningful API documentation.
Serilog — Structured Logging, Enrichers, and Sinks in Clean Architecture
How to configure Serilog in a Clean Architecture .NET project: structured logging, request logging middleware, enrichers for correlation and user context, multiple sinks, and the production mistakes that make logs unsearchable.
Strict Analyzers and Code Style — Enforcing Consistency Across the Solution
How to configure Roslyn analyzers, .editorconfig, TreatWarningsAsErrors, and nullable reference types in a Clean Architecture .NET project to enforce code quality and consistency automatically.
Testing Strategy — What to Unit Test, What to Integration Test, What to Skip
A practical testing strategy for Clean Architecture .NET projects: the test pyramid, what belongs in each level, Testcontainers for integration tests, and the production quality signals that come from the right test mix.
Unit Testing Application Handlers — xUnit v3 and FluentAssertions
How to write unit tests for Clean Architecture command and query handlers: xUnit v3, FluentAssertions, fake implementations vs mocking, in-memory EF Core, and the tests that actually catch production bugs.
Value Objects — Immutability, Equality, and When to Use Them
Value objects in Clean Architecture: definition, implementation with C# records, equality semantics, validation inside value objects, strongly-typed IDs, and when an entity is the better choice.
When NOT to Use Clean Architecture — Trade-offs, Complexity, and Alternatives
Honest assessment of when Clean Architecture adds overhead without value: small projects, tight deadlines, CRUD-heavy APIs, and the alternative patterns (Vertical Slice, Minimal API, Modular Monolith) that fit those contexts better.
Async Task Execution in CrewAI
Run independent CrewAI tasks concurrently with async_execution=True. Understand when tasks can be parallelized and how to synchronize results.
Hierarchical Process: Manager Agent
Use Process.hierarchical to give CrewAI a manager agent that dynamically delegates tasks to specialist agents. Learn when hierarchical beats sequential and how to configure manager_llm.
Running and Monitoring a Crew
Call kickoff(), kickoff_async(), and kickoff_for_each() to run CrewAI crews. Use callbacks to monitor task and step progress in real time.
Structured Output and Pydantic Models
Force CrewAI task output into Pydantic models with output_pydantic, or get raw JSON with output_json. Access typed results for downstream processing.
Interview: CrewAI in Production
10 Q&A pairs on running CrewAI in production: sequential vs hierarchical, async execution, memory, error handling, cost control, and system design questions.
Sequential Process: How Tasks Flow Between Agents
Understand how Process.sequential chains task outputs through a crew. Learn task ordering, output inheritance, and when sequential is the right process choice.
Task Dependencies and Context Passing
Pass output from one CrewAI task to another using the context parameter. Learn when to use context vs sequential dependency and how to avoid information loss between tasks.
Dapper Multi-Mapping — Joining Related Data into Object Graphs
Map JOIN query results to related objects in Dapper: splitOn for one-to-one, collecting one-to-many relationships manually, nested multi-mapping, and when to use multi-mapping versus separate queries.
Dapper Multi-Result Sets — QueryMultiple for Batch Queries
Use Dapper's QueryMultiple to execute multiple SELECT statements in a single database round trip: reading multiple result sets, correlating parent-child data, and patterns for dashboard queries.
Dapper Parameters — Safe Parameterization and Dynamic SQL
Pass parameters safely in Dapper: anonymous objects, DynamicParameters, IN clause with lists, dynamic WHERE building, preventing SQL injection, and handling nullable parameters.
Dapper Queries — QueryAsync, QueryFirstOrDefaultAsync, and QuerySingleAsync
Execute Dapper queries in ASP.NET Core: QueryAsync for lists, QueryFirstOrDefaultAsync for single rows, QuerySingleAsync for exactly-one results, async patterns, and connection management.
Dapper Stored Procedures — Calling and Mapping Stored Procedure Results
Call SQL Server stored procedures with Dapper: CommandType.StoredProcedure, input/output parameters, return values, multi-result-set stored procedures, and when stored procedures make sense versus inline SQL.
Dapper Transactions — Coordinating Multiple Operations
Use transactions in Dapper: BeginTransaction, passing transactions to queries, nested operations, savepoints, transaction scope with ambient transactions, and integrating with the Unit of Work pattern.
Aggregates and Aggregate Roots — DDD Design Rules
Design EF Core aggregates correctly: what qualifies as an aggregate root, the rules for aggregate boundaries, transactional consistency, invariant enforcement, and clinical examples with Prescription and Patient aggregates.
Bounded Contexts and Context Mapping in DDD
Design bounded contexts in DDD: identifying context boundaries, the ubiquitous language, context mapping patterns (shared kernel, anti-corruption layer, customer-supplier), and implementing context boundaries in .NET.
Domain Events — Raising, Dispatching, and Handling
Implement domain events in DDD: raising events from aggregate roots, collecting and dispatching them after SaveChanges, MediatR notification handlers, and domain events vs integration events for cross-service communication.
Repositories in DDD — Contracts and Implementations
Implement the Repository pattern in DDD: interface contracts in the domain layer, EF Core implementations in infrastructure, generic vs specific repositories, and when to skip the repository pattern.
Tactical DDD Patterns — Specifications, Policies, and Domain Services
Implement tactical DDD patterns in C#: the Specification pattern for query rules, Policy objects for business rules, Domain Services for cross-aggregate logic, and Factory methods for complex construction.
Value Objects in C# — Immutability and Structural Equality
Implement DDD value objects in C#: records vs classes, structural equality, factory methods with validation, common value objects (Money, Address, PatientMrn), and persisting value objects with EF Core.
Docker Compose Health Checks — Startup Order and Readiness
Configure Docker Compose health checks for .NET services: defining health checks, controlling startup order with depends_on conditions, liveness vs readiness, and debugging unhealthy containers.
Docker Compose Networking — Service Communication in Containers
Configure Docker Compose networking for .NET applications: default bridge networks, custom networks, service discovery by name, DNS resolution, and multi-network setups for security isolation.
Docker Compose in Production — Patterns and Limitations
Use Docker Compose in production for small deployments: resource limits, restart policies, environment variable injection, secrets management, and when to move beyond Docker Compose to Kubernetes.
Docker Compose Volumes — Persisting Data and Sharing Files
Configure Docker Compose volumes for .NET applications: named volumes for database persistence, bind mounts for development, read-only configuration mounts, and volume management strategies.
External Authentication Providers — Google, Microsoft, Azure AD
Integrate external OAuth providers (Google, Microsoft, Azure AD) with ASP.NET Core Identity: provider setup, claim mapping, linking external logins to local accounts, and multi-tenant Azure AD.
Roles and Claims Management with ASP.NET Core Identity
Manage roles and claims with ASP.NET Core Identity: RoleManager, assigning roles to users, role-based vs claim-based authorization, seeding roles on startup, and hierarchical role patterns.
ASP.NET Core Identity Setup — Users, Passwords, and Stores
Configure ASP.NET Core Identity correctly: custom user entity, password hashing, EF Core store, token providers, and the identity pipeline in a Clean Architecture project.
JWT Claims — Designing the Payload for Your Application
How to design JWT claims correctly: standard vs custom claims, claim-based authorization, reading claims in handlers, and avoiding the over-stuffed token anti-pattern.
Authentication in Minimal APIs — Endpoints, Filters, and Patterns
Apply JWT authentication and authorization to ASP.NET Core Minimal APIs: endpoint-level requirements, route groups with shared auth, endpoint filters for pre-authorization logic, and production patterns.
OAuth 2.0 and OpenID Connect — The Concepts Every .NET Developer Needs
OAuth 2.0 and OIDC demystified: authorization code flow, tokens, scopes, the difference between authentication and authorization, and how ASP.NET Core integrates with external identity providers.
Policy-Based Authorization in ASP.NET Core
Build flexible, testable authorization policies in ASP.NET Core: requirements, handlers, resource-based authorization, and the patterns that replace role-check spaghetti in production systems.
Refresh Tokens — Keeping Users Logged In Safely
Implement refresh token rotation in ASP.NET Core: storing refresh tokens securely, the rotation pattern, detecting token reuse attacks, and why refresh tokens must be treated like passwords.
Security Headers and API Hardening in ASP.NET Core
Secure your ASP.NET Core API with security headers, CORS policy, HTTPS enforcement, rate limiting, and the defense-in-depth patterns that harden APIs against common web attacks.
HybridCache — The Best of Both Caches in .NET 9
Microsoft.Extensions.Caching.Hybrid combines L1 in-process and L2 distributed caching with built-in stampede protection, tag-based invalidation, and a simpler API than managing IMemoryCache and IDistributedCache separately.
IDistributedCache — Shared Caching with Redis in ASP.NET Core
IDistributedCache with Redis: setup, serialization, expiration, cache-aside pattern, and the distributed caching patterns that ensure consistency across multiple API instances.
IMemoryCache — In-Process Caching in ASP.NET Core
IMemoryCache in depth: registration, absolute and sliding expiration, cache entry options, size limits, eviction callbacks, and the production patterns for safe in-process caching.
Cache Invalidation — The Hard Part of Caching
Cache invalidation strategies: event-driven invalidation, TTL-based expiry, tag-based bulk invalidation, write-through caching, and the patterns that prevent stale data in clinical systems.
Output Caching in ASP.NET Core — Caching HTTP Responses
ASP.NET Core output caching: caching full HTTP responses, vary-by rules, cache policies, tag-based invalidation from handlers, and when to use output cache vs data cache.
Caching Patterns — Cache-Aside, Read-Through, and Production Design
Common caching design patterns: cache-aside, read-through, write-through, write-behind, and how to choose the right pattern for different data types in a production ASP.NET Core system.
Distributed Tracing and Correlation IDs in .NET
Implement correlation IDs for distributed tracing in ASP.NET Core: propagating trace IDs across services, W3C trace context, Activity API, correlation middleware, and connecting logs to traces in Application Insights.
Serilog Enrichers — Adding Context to Every Log Entry
Enrich Serilog log entries with contextual properties: machine name, environment, request ID, user ID, tenant ID, custom enrichers, LogContext.PushProperty, and Destructurama for complex objects.
ILogger in ASP.NET Core — Structured Logging Patterns
Use ILogger effectively in ASP.NET Core: log levels, message templates, structured properties, LoggerMessage source generators, high-performance logging, and avoiding common logging anti-patterns.
Request Logging — HTTP Traffic Observability in ASP.NET Core
Log HTTP requests and responses in ASP.NET Core: Serilog's UseSerilogRequestLogging, HttpLogging middleware, custom request logging middleware, performance logging, and what to include versus exclude.
Serilog Sinks — Routing Logs to the Right Destinations
Configure Serilog sinks: Console for development, Seq for structured querying, Application Insights for Azure, file rolling sinks, sub-loggers for routing by level, and async sink wrapper for performance.
Testing With EF Core — In-Memory vs Real Database
Test EF Core queries, configurations, and migrations: when to use in-memory vs Testcontainers, testing global query filters, migration testing, and the approach that finds real bugs.
FluentAssertions — Readable Assertions and Error Messages
FluentAssertions v7 in .NET: collection assertions, object graph comparison, exception assertions, custom assertion messages, and the patterns that make test failures self-explanatory.
Mocking with NSubstitute — Fakes, Stubs, and Spies
Use NSubstitute to isolate units under test: creating substitutes, configuring return values, verifying calls, argument matchers, and the mocking anti-patterns that make tests brittle.
Test Strategy — Pyramid, Coverage, and What to Skip
Build an effective test strategy: the test pyramid for Clean Architecture, what coverage metrics actually tell you, which tests to write first, and the signals that tell you when tests are wrong.
Test Doubles — Mocks, Fakes, Stubs, and When to Use Each
The five types of test doubles: dummies, stubs, spies, mocks, and fakes — what each is for, how to implement them in .NET, and the rules for choosing the right one.
Testcontainers — Real Databases in Docker for Tests
Use Testcontainers to run real SQL Server and Redis containers in your .NET integration tests: setup, lifetime management, migration, connection string wiring, and why in-memory databases lie to you.
Theory and InlineData — Parameterized Tests in xUnit
Write parameterized tests with xUnit Theory: InlineData, MemberData, ClassData, TheoryData, and the data-driven testing patterns that eliminate repetitive test boilerplate.
WebApplicationFactory — Real HTTP Tests Without a Server
Test ASP.NET Core APIs end-to-end with WebApplicationFactory: in-memory HTTP client, service replacement, authentication setup, and the patterns for fast integration tests that catch real bugs.
EF Core Concurrency — Optimistic Locking and Conflict Handling
Handle concurrent updates in EF Core: optimistic concurrency with row version and concurrency tokens, handling DbUpdateConcurrencyException, pessimistic locking with UPDLOCK, and clinical workflow patterns.
EF Core Entity Configurations — Fluent API and IEntityTypeConfiguration
Configure EF Core entities using IEntityTypeConfiguration, fluent API, value converters, owned entity configuration, table naming conventions, and applying configurations automatically.
EF Core Interceptors — Hooking into Database Operations
Use EF Core interceptors to add cross-cutting concerns: audit logging on SaveChanges, soft delete automation, query tagging, command interception for performance monitoring, and transaction interceptors.
EF Core Migrations — Managing Schema Changes
Manage EF Core migrations in production: creating and applying migrations, migration bundles, idempotent scripts, rollback strategies, data seeding, and multi-environment migration patterns.
EF Core N+1 Problem — Detection and Resolution
Identify and fix the N+1 query problem in EF Core: how it manifests with navigation properties, detection with logging and profiling tools, and the patterns to prevent it using Include, projection, and batch loading.
EF Core Owned Entities — Mapping Value Objects
Map domain value objects as EF Core owned entities: OwnsOne, OwnsMany, table splitting, JSON column storage, and the patterns for keeping value objects in the domain while persisting them correctly.
EF Core Performance — Query Optimization and Benchmarking
Optimize EF Core performance: compiled queries, AsNoTracking, connection pooling, bulk operations with ExecuteUpdate/ExecuteDelete, change tracker overhead, and profiling with MiniProfiler and query logs.
EF Core Querying — LINQ to SQL, Projections, and Filtering
Write efficient EF Core queries: LINQ operators that translate to SQL, projection with Select, global query filters, split queries for large includes, query tags, and avoiding common N+1 patterns.
EF Core Raw SQL — FromSqlRaw, ExecuteSqlRaw, and Dapper Integration
Execute raw SQL in EF Core: FromSqlRaw for entity queries, ExecuteSqlRaw for commands, SqlQuery for arbitrary projections, safe parameterization to prevent SQL injection, and when to drop to Dapper.
EF Core Relationships — One-to-Many, Many-to-Many, and Navigation Properties
Configure EF Core relationships with fluent API: one-to-many, one-to-one, many-to-many with join entities, cascade delete, shadow properties, and loading strategies for navigation properties.
A/B Testing LLM Applications
Design and analyze A/B tests for LLM changes: prompt updates, model versions, and retrieval improvements. Use traffic splitting, statistical significance, and guardrail metrics.
BERTScore: Semantic Similarity for Text Evaluation
Use BERTScore to measure semantic similarity between generated and reference text. Understand how contextual embeddings improve on surface-level metrics like BLEU.
CI/CD Evaluation: Automated Evals in Your Pipeline
Run LLM evaluations automatically on every code change. Catch regressions before they reach production with eval suites, thresholds, and GitHub Actions integration.
Interview: LLM Evaluation
12 Q&A pairs on LLM evaluation: choosing metrics, RAGAS, LLM-as-judge, CI evals, A/B testing, benchmark interpretation, and system design questions.
LLM Judge Bias and Reliability
Identify and mitigate systematic biases in LLM-as-judge evaluation: position bias, verbosity bias, self-enhancement bias, and calibration problems.
LLM-as-Judge: Using AI to Evaluate AI
Use a stronger LLM to evaluate the quality of another model's outputs. Design effective judge prompts, score on multiple dimensions, and understand the limitations.
Pointwise vs Pairwise Evaluation
Understand the difference between scoring individual responses (pointwise) and comparing two responses directly (pairwise). Learn when each approach is more reliable.
Popular LLM Benchmarks Explained
Understand MMLU, HellaSwag, HumanEval, MT-Bench, Chatbot Arena, and other standard benchmarks. Learn what each measures and how to use them for model selection.
RAGAS: Evaluating RAG Pipelines
Use the RAGAS framework to measure RAG pipeline quality across four dimensions: faithfulness, answer relevancy, context precision, and context recall.
Event Sourcing with CQRS — Commands Write Events, Queries Read Projections
Combine event sourcing with CQRS in .NET: commands append events to the event store, queries read from denormalised projections, and the two models evolve independently.
The Event Store — Persisting Events as the Source of Truth
Build and use an event store in .NET: appending events, reading streams, optimistic concurrency with expected version, and why events are the source of truth instead of current state.
Marten — Event Sourcing and Document Storage on PostgreSQL
Use Marten for event sourcing in .NET: configuring the document store, appending events, loading aggregates with live aggregation and snapshots, building projections, and integrating with ASP.NET Core.
Projections — Building Read Models from Events
Build and maintain event sourcing projections in .NET: synchronous inline projections, asynchronous background projections, projection rebuilding, and handling projection failures in a clinical system.
Snapshots — Avoiding Long Event Stream Replay
Use snapshots in event sourcing to avoid replaying thousands of events: snapshot storage, rehydration with a snapshot baseline, snapshot frequency strategies, and when snapshots are worth the complexity.
Adapter Layers: How PEFT Works
Understand how adapter layers insert small trainable modules into a frozen LLM. Learn the architecture of adapters, how they differ from LoRA, and when to use each.
Benchmarking Fine-Tuned Models
Use standard benchmarks and domain-specific evals to measure fine-tuned model quality. Understand MMLU, HellaSwag, TruthfulQA, and how to build custom benchmark suites.
Training Data Formats for Fine-Tuning
Format training data correctly for instruction fine-tuning and chat fine-tuning. Understand prompt templates, chat templates, and how to structure JSONL datasets.
Data Quality for Fine-Tuning
What makes fine-tuning data high quality. Learn how to audit, clean, and score training examples to maximize model improvement per training example.
How Much Data Do You Need to Fine-Tune?
Understand the relationship between dataset size and fine-tuning effectiveness. Learn minimum data requirements for different fine-tuning goals and how to estimate what you need.
Evaluating Fine-Tuned Models
Measure whether fine-tuning actually improved your model. Use task-specific metrics, LLM-as-judge evaluation, and A/B comparison against the base model.
Interview: Fine-Tuning LLMs
12 Q&A pairs on fine-tuning: LoRA vs full fine-tuning, rank selection, data requirements, DPO, catastrophic forgetting, evaluation, and production deployment.
LoRA Rank Selection: How to Choose r
Understand how LoRA rank r controls the parameter count and expressiveness of fine-tuning. Learn heuristics for choosing r, alpha, and target modules for different tasks.
RLHF and DPO for Alignment Fine-Tuning
Align a fine-tuned LLM with human preferences using RLHF or DPO. Understand the preference dataset format, the DPO loss function, and when each method applies.
Synthetic Data Generation for Fine-Tuning
Use a stronger LLM to generate training data for fine-tuning a smaller model. Learn seed-based generation, quality filtering, and the self-instruct approach.
GitHub Actions Deployment — CD Pipelines to Azure
Deploy .NET applications to Azure with GitHub Actions: Blue/Green deployment to App Service, deployment slots, environment approval gates, rollback strategies, and production deployment workflows.
GitHub Actions for .NET — CI Pipeline for ASP.NET Core
Build a production-grade CI pipeline for .NET with GitHub Actions: build, test, code coverage, linting, Dockerfile builds, and caching for fast feedback loops.
GitHub Actions Matrix — Parallel Builds and Multi-Environment Testing
Use GitHub Actions matrix strategy to run jobs across multiple .NET versions, operating systems, and test configurations in parallel — reducing total CI time for multi-target libraries and clinical platform modules.
GitHub Actions Secrets — Managing Credentials Securely in CI/CD
Manage secrets in GitHub Actions: repository secrets, environment secrets, OIDC-based keyless authentication to Azure, secret scanning, and preventing accidental secret exposure in logs.
Bidirectional Streaming RPC — Full-Duplex gRPC
Implement bidirectional streaming in gRPC ASP.NET Core: reading and writing concurrently, chat-style protocols, real-time collaborative workflows, and the production patterns for managing concurrent streams.
Client Streaming RPC — Uploading Data Flows with gRPC
Implement client-streaming gRPC in ASP.NET Core: receiving streams from clients, reading observations in order, processing with back-pressure, and when client streaming fits your data ingestion patterns.
gRPC Interceptors — Cross-Cutting Concerns in gRPC
Build gRPC interceptors in ASP.NET Core: logging interceptors, authentication validation, error handling, retry policies on the client, and applying interceptors globally vs per-service.
Protocol Buffers — Defining gRPC Contracts
Write .proto files for gRPC services: message types, field types and numbers, repeated fields, oneofs, enums, nested messages, and the proto3 conventions used in .NET gRPC projects.
Server Streaming RPC — Pushing Data Flows with gRPC
Implement server-streaming gRPC in ASP.NET Core: streaming multiple responses for one request, handling cancellation, real-time data feeds, and when server streaming beats repeated polling.
Unary RPC — Request-Response gRPC in ASP.NET Core
Implement unary gRPC endpoints in ASP.NET Core: service implementation, error handling with StatusCode, authentication, dependency injection, and calling gRPC services from .NET clients.
gRPC vs REST — Choosing the Right Protocol
Decide between gRPC and REST for your .NET services: performance comparison, browser compatibility, tooling, streaming support, and the decision framework for internal vs external APIs.
Testing Authentication and Authorisation in ASP.NET Core
Write integration tests for secured ASP.NET Core APIs: fake JWT authentication, custom test auth handlers, testing role-based and policy-based authorisation, and WebApplicationFactory patterns for clinical APIs.
Testing External Dependencies — Mocking HTTP and Third-Party Services
Test .NET services that depend on external HTTP APIs, FHIR servers, and third-party integrations: WireMock.NET for HTTP stubbing, Polly resilience testing, and contract testing patterns.
Test Isolation — Preventing Test Interference
Ensure integration tests don't interfere with each other: database cleanup strategies, transaction rollback, test data builders, unique identifiers per test, and avoiding shared mutable state.
Testcontainers — Real Databases in Integration Tests
Use Testcontainers in .NET to run real SQL Server, PostgreSQL, and Redis instances in integration tests: setup, shared containers, lifecycle management, and testing EF Core against a real database.
Agents: How LangChain Agents Work
Understand LangChain agent internals: the reasoning loop, thought-action-observation cycle, how tool calls work, and the difference between ReAct and tool-calling agents.
AgentExecutor: Running Agents Safely
Configure AgentExecutor for production: iteration limits, error handling, streaming agent output, early stopping, verbose logging, and async execution.
Implement LLM Response Caching
Build a semantic cache for LLM responses. Cache exact matches with a hash, and semantically similar queries with embedding similarity to reduce API costs and latency.
Implement a Text Chunker
Build a recursive text chunker for RAG pipelines. Implement fixed-size, sentence-aware, and recursive chunking with overlap to preserve context at chunk boundaries.
Implement Cosine Similarity
Implement cosine similarity from scratch. Understand why it measures semantic closeness, how it relates to vector search, and how to use it efficiently with NumPy.
Dot Product and Attention Scores
Implement dot product attention from scratch. Understand why transformers use scaled dot product attention and how query-key-value attention works step by step.
Implement k-Nearest Neighbors Search
Implement k-NN search from scratch for vector retrieval. Understand brute-force vs approximate methods, and how k-NN underlies semantic search in RAG systems.
Mock Live Coding Interview
Full mock live coding interview for AI engineers: 4 problems with interviewer notes, expected approach, common mistakes, and follow-up questions.
Implement a Rate Limiter
Implement a token bucket rate limiter from scratch. Handle the core algorithm, then extend to async and Redis-backed implementations for production use.
Implement Softmax and Temperature Scaling
Implement softmax from scratch, handle numerical stability, and understand temperature scaling. See how softmax converts logits to probabilities in LLM token sampling.
Parse Streaming LLM Output
Implement a streaming parser for Server-Sent Events from OpenAI and Anthropic APIs. Handle partial JSON, tool call streaming, and real-time display.
Implement TF-IDF from Scratch
Implement TF-IDF (Term Frequency-Inverse Document Frequency) in Python from scratch. Understand the math, code it step by step, and see how it powers keyword search.
ConversationBufferMemory: Simple History
Implement ConversationBufferMemory in LangChain for multi-turn conversations. Manage history, integrate with LCEL chains, persist across sessions, and handle context limits.
Callbacks: Hooking into LangChain Events
Use LangChain callbacks for logging, cost tracking, streaming progress, and custom observability. Implement BaseCallbackHandler for chain, LLM, and tool events.
Conditional Routing with RunnableBranch
Route queries to different chains based on content with RunnableBranch. Build classifier-router patterns, query complexity routing, and if-else logic in LCEL.
Models, Prompts, Chains, Tools: The Four Primitives
Understand the four core LangChain abstractions: language models, prompt templates, chains, and tools. How they compose to build AI applications.
Document Loaders: Ingesting Data into LangChain
Load PDFs, web pages, CSVs, databases, and custom sources into LangChain Document objects. Learn batch loading, metadata enrichment, and error-resilient ingestion pipelines.
FewShotPromptTemplate in LangChain
Implement few-shot prompting in LangChain with FewShotPromptTemplate, dynamic example selection, SemanticSimilarityExampleSelector, and LengthBasedExampleSelector.
When to Use LangChain vs Raw OpenAI SDK
Make the right choice: LangChain vs raw OpenAI/Anthropic SDK. Understand the tradeoffs, when abstraction helps, when it hinders, and how to decide for your use case.
LangSmith: Tracing and Debugging LangChain Apps
Set up LangSmith tracing, inspect chain runs, add custom metadata, compare prompt versions in the playground, and run automated evaluations against test datasets.
LCEL: LangChain Expression Language Overview
Master LCEL — LangChain's pipe-based composition syntax. Build chains with |, understand Runnable interface, and use invoke, stream, batch, and async methods.
LLMChain: The Building Block
Understand LLMChain — LangChain's foundational chain. Learn prompt formatting, output parsing, variable injection, and how LLMChain became the basis for LCEL.
Types of Memory in LangChain
Survey LangChain's memory types: buffer, window, summary, entity, and vector-store memory. When to use each and how memory integrates with conversational chains.
HumanMessage, AIMessage, SystemMessage
Understand LangChain's message types: HumanMessage, AIMessage, SystemMessage, ToolMessage, and FunctionMessage. How they map to provider APIs and flow through chains.
Parallel Chains with RunnableParallel
Run multiple LangChain chains simultaneously with RunnableParallel. Reduce latency by parallelizing independent steps, merge outputs, and handle fan-out patterns.
Composing Complex Prompts from Parts
Build modular, reusable prompts in LangChain. Combine system instructions, few-shot examples, context blocks, and format requirements into composable prompt components.
PromptTemplate and ChatPromptTemplate
Master LangChain prompt templates: PromptTemplate vs ChatPromptTemplate, partial variables, template composition, format instructions, and prompt version control.
Building a RAG Chain with LangChain
Build retrieval-augmented generation chains with LCEL. Covers basic RAG, conversational RAG with history, source citation, streaming, and production patterns.
Retrievers: Advanced Retrieval Strategies
Go beyond basic vector search. Build multi-query, contextual compression, BM25 hybrid, parent-document, and self-querying retrievers for production RAG pipelines.
The Runnable Interface: pipe(), invoke(), stream()
Deep dive into LangChain's Runnable protocol. Understand invoke, stream, batch, async methods, config injection, and how to build custom Runnables.
Sequential Chains: Chaining Multiple Steps
Build multi-step LangChain pipelines where outputs feed into next steps. RunnableSequence, RunnablePassthrough.assign, and patterns for complex sequential workflows.
Streaming: Real-Time Output in LangChain
Stream LLM tokens in real-time with LCEL stream(), astream(), and astream_events(). Build streaming RAG, streaming agents, and Server-Sent Events for web UIs.
ConversationSummaryMemory: Compressed History
Use ConversationSummaryMemory and ConversationSummaryBufferMemory to handle long conversations by compressing older turns into LLM-generated summaries.
Text Splitters: Chunking Documents for RAG
Chunk documents effectively for retrieval. Compare recursive, semantic, token-based, and code splitters. Tune chunk size and overlap for your use case.
Tool Calling Agent vs ReAct Agent
Compare LangChain's tool calling agent and ReAct agent. Understand the underlying mechanics, when to use each, and how to configure parallel tool calls.
Defining Custom Tools with @tool
Create LangChain tools with @tool decorator, StructuredTool, BaseTool class, and Pydantic input schemas. Build validated, type-safe tools for clinical AI agents.
VectorStoreRetrieverMemory: Semantic History
Build long-term semantic memory with VectorStoreRetrieverMemory in LangChain. Store conversation history as embeddings and retrieve relevant past exchanges by similarity.
Vector Stores: Storing and Searching Embeddings
Store and retrieve document embeddings with Chroma, FAISS, and Pinecone. Learn similarity search, metadata filtering, MMR retrieval, and vector store management.
Annotated State and Type Safety
Use Python's Annotated type to add metadata and reducers to LangGraph state fields. Design clear, type-safe state schemas for complex agent workflows.
Checkpointing: Persistent State in LangGraph
Use LangGraph checkpointers to persist agent state across runs. Compare MemorySaver, SqliteSaver, and PostgresSaver. Enable multi-session conversations and crash recovery.
Cycles and Loops in LangGraph
Build graphs with cycles for iterative agent behavior. Use conditional edges to loop until a condition is met, and understand how LangGraph prevents infinite loops.
Entry Points, Finish Points, and Graph Compilation
Configure entry and finish points in LangGraph. Understand set_entry_point, set_finish_point, multiple entry points, and what graph compilation does.
Interview: LangGraph Fundamentals
8 Q&A pairs on LangGraph core concepts: StateGraph vs DAG, conditional edges, cycles, checkpointing, human-in-the-loop, and when to use LangGraph.
Human-in-the-Loop Workflows
Pause LangGraph execution for human review, approval, or correction. Use interrupt_before and interrupt_after to build workflows where humans and agents collaborate.
Interview: LangGraph in Production
10 senior-level questions and answers on deploying and operating LangGraph agents in production: checkpointing, error handling, scaling, cost, and system design.
State Updates and Reducers
Control how LangGraph state is updated. Use default replacement semantics, operator.add for accumulation, and custom reducer functions for complex merge logic.
Subgraphs: Composing Complex Agent Systems
Build modular LangGraph agents by composing subgraphs. Use compiled subgraphs as nodes in a parent graph to create hierarchical, reusable agent components.
Supervisor Pattern: Multi-Agent Coordination
Build a supervisor agent that routes work to specialized subagents. Implement the supervisor pattern in LangGraph for dynamic multi-agent orchestration.
Time Travel: Replaying and Branching Graph Execution
Use LangGraph's time travel feature to replay execution from any past checkpoint, branch into alternative continuations, and debug complex agent behavior.
Custom LINQ Operators — Extending the Query Pipeline
Build reusable custom LINQ extension methods: pagination, soft-delete filtering, ordering helpers, and the patterns that remove query boilerplate from handlers without leaking EF Core concerns.
Deferred Execution in LINQ — How Queries Actually Run
Understand LINQ's deferred execution model: when queries evaluate, how to force immediate execution, the N+1 problem it causes in EF Core, and the production bugs that result from misunderstanding it.
Expression Trees — How LINQ Queries Become SQL
Understand how LINQ expression trees work: the difference between Func and Expression, how EF Core translates expressions to SQL, building dynamic queries with PredicateBuilder, and common translation failures.
LINQ Filtering and Projection — Where, Select, and Efficient Queries
Master LINQ's Where and Select operators: compound predicates, null-safe filtering, projection to DTOs, SelectMany for nested collections, and avoiding the projection pitfalls that cause N+1 queries.
LINQ GroupBy and Aggregates — Summarizing Data Efficiently
GroupBy, Count, Sum, Average, Min, Max in LINQ and EF Core: how they translate to SQL GROUP BY, when to group in memory vs SQL, and the aggregation patterns used in clinical reporting.
LINQ Join Types — Inner, Left, Cross, and GroupJoin
LINQ join operations: inner join, left outer join with DefaultIfEmpty, cross join, GroupJoin for hierarchical results, and how these map to SQL in EF Core.
LINQ Performance — Writing Queries That Scale
LINQ performance patterns: avoiding N+1, efficient pagination, AsNoTracking, compiled queries, chunking, parallel LINQ, and the profiling approach that finds query bottlenecks before production.
LLM-as-Judge
Using a capable LLM to evaluate other LLM outputs — single-answer grading, pairwise comparison, the MT-Bench framework, and reliability considerations.
Attention in LLMs: Deep Dive
Multi-query, grouped-query, and multi-head attention variants in modern LLMs — how they differ, their KV cache implications, and the FlashAttention implementation.
LLM Batching Strategies
How static, dynamic, and continuous batching work for LLM serving — why batching matters for throughput, and the implementation behind vLLM's continuous batching.
LLM Benchmarks
The key benchmarks used to evaluate LLMs — MMLU, HumanEval, GSM8K, HellaSwag, TruthfulQA — what they test, their limitations, and how to interpret leaderboard claims.
BLEU and ROUGE
How BLEU and ROUGE scores work, what they measure, their formulas, implementation, and why they fall short for evaluating modern LLM outputs.
Constitutional AI and RLHF
How Constitutional AI uses a set of principles to self-critique and refine outputs, how it relates to RLHF, and how alignment pipelines are structured in practice.
Direct Preference Optimisation (DPO)
How DPO aligns LLMs from human preferences without reinforcement learning — the objective, how it compares to RLHF, and practical implementation details.
Interview Q&A: LLM Alignment
Common interview questions on RLHF, DPO, Constitutional AI, hallucination, and safety — framed for senior ML engineering and AI systems roles.
Interview Q&A: LLM Architecture
Common senior interview questions on LLM architecture — the decoder stack, attention variants, training stability, and how modern improvements built on the original Transformer.
Interview Q&A: LLM Inference Optimisation
Common interview questions on making LLM inference faster and cheaper — quantisation, KV cache, speculative decoding, batching, and production serving trade-offs.
Interview Q&A: LLM Training
Common interview questions on LLM pretraining, fine-tuning, instruction tuning, and LoRA — covering data, objectives, hardware, and practical optimisation choices.
KV Cache
How the KV cache works in autoregressive generation, its memory cost, and techniques to manage it — GQA, quantisation, paged attention, and streaming eviction.
Perplexity
What perplexity measures, how it's computed from a language model's log-likelihood, what values indicate, and why it's useful and limited as an evaluation metric.
Positional Encoding in LLMs
How positional encoding in production LLMs differs from the original Transformer — RoPE details, context length extension, and practical limits of each approach.
LLM Pretraining
How LLMs are pretrained — the data pipeline, next-token prediction objective, training infrastructure, and how pretraining shapes what the model knows.
LLM Quantisation
How quantisation reduces LLM memory and compute requirements — INT8, INT4, GPTQ, AWQ, and the quality/size trade-offs at each precision level.
Scaling Laws
How model performance scales with parameters, data, and compute — the Kaplan and Chinchilla laws, the compute-optimal frontier, and practical implications for model development.
Speculative Decoding
How speculative decoding uses a small draft model to speed up generation from a large model, the acceptance criterion, and the latency gains achievable in practice.
Tokenisation and Byte-Pair Encoding
How text is split into tokens, how BPE builds its vocabulary, why the choice of tokeniser matters, and how to inspect tokenisations in practice.
Transformer Architecture Overview for LLMs
How modern decoder-only LLMs extend the original Transformer — the architectural changes from GPT-1 to LLaMA, and the components of a production LLM block.
vLLM and TensorRT-LLM
How the two leading LLM serving frameworks work, their architectural choices, when to use each, and key configuration decisions for production deployment.
Build and Consume MCP Servers in .NET
Model Context Protocol in .NET — create Stdio and HTTP MCP servers with tools, resources, and prompts, then connect them from AI clients and your own chat app.
API Gateway Pattern — Routing, Auth, and Rate Limiting
Implement the API Gateway pattern for microservices: YARP as a .NET reverse proxy, request routing, centralized authentication, rate limiting, request aggregation, and when to use BFF vs API Gateway.
Distributed Data — Database per Service Pattern
Manage data in microservices: database-per-service ownership, eventual consistency, the Saga pattern for distributed transactions, data duplication strategies, and the CQRS read model pattern.
Distributed Observability — Tracing Across Microservices
Implement observability in .NET microservices: distributed tracing with OpenTelemetry, centralized structured logging with correlation IDs, health checks, metrics with Prometheus, and building a production monitoring stack.
Authentication and Authorization in Minimal APIs
Apply JWT authentication and policy-based authorization to Minimal API endpoints: RequireAuthorization, route groups with shared auth, resource-based authorization, and the patterns that secure clinical APIs.
Dependency Injection in Minimal APIs — Services, Scopes, and Lifetime
How DI works in Minimal API endpoints: service injection, parameter binding order, keyed services, service lifetimes, and patterns for organizing DI in large Minimal API projects.
Endpoint Filters — Cross-Cutting Concerns in Minimal APIs
Build reusable endpoint filters in ASP.NET Core Minimal APIs: validation filters, logging filters, rate limit filters, filter pipelines, and how they replace action filters from MVC.
OpenAPI and Scalar — API Documentation in Minimal APIs
Generate OpenAPI documentation for Minimal APIs: .NET 9 built-in OpenAPI, Scalar UI, describing endpoints with WithSummary and WithOpenApi, request/response schemas, and producing documentation CI can validate.
Route Groups — Organizing Minimal APIs at Scale
Use MapGroup to structure Minimal API endpoints: shared prefixes, shared middleware, auth policies per group, nested groups, and the endpoint organization patterns that replace controllers.
Routing in Minimal APIs — Patterns, Constraints, and Parameters
Master Minimal API routing: route parameters, query strings, route constraints, regex routes, catch-all segments, and the patterns that build a clean URL structure for REST APIs.
Minimal APIs vs Controllers — When to Choose Which
Honest comparison of Minimal APIs and MVC Controllers: performance, testability, organization at scale, team familiarity, and the framework signals for choosing one over the other in new and existing .NET projects.
Why Accuracy Alone Isn't Enough
Why accuracy is misleading for imbalanced datasets: the majority-class baseline trap, class imbalance examples, and which metrics to use instead — with clinical ML examples.
Which Algorithms Need Feature Scaling?
A definitive guide to which ML algorithms require feature scaling, which don't, and why — with code demonstrating the impact, scaling recommendations per algorithm, and a quick reference table.
What AUC Really Means
AUC demystified: the probabilistic interpretation, why it's threshold-independent, AUC-ROC vs AUC-PR, partial AUC, and how to communicate AUC to non-technical clinical stakeholders.
How to Balance Bias and Variance in Practice
Practical guide to balancing bias and variance: learning curves, validation curves, regularization tuning, ensemble methods, and a step-by-step decision framework for real ML projects.
The Bias-Variance Tradeoff Explained
The bias-variance tradeoff: why reducing one typically increases the other, the total error decomposition, intuition with the bullseye analogy, and practical strategies for finding the sweet spot.
What is Bias in Machine Learning?
Understand bias in ML: systematic error from wrong assumptions, underfitting, high-bias models, sources of algorithmic bias, and the difference between statistical bias and societal bias.
Categorical Encoding
Convert categorical variables to numeric form: one-hot encoding, ordinal encoding, target encoding, binary encoding, and when to use each with clinical ML examples.
Classification Threshold Tuning
Classification threshold explained: why 0.5 is rarely optimal, how to move the threshold to trade off precision and recall, and how to pick the right threshold for clinical and safety-critical ML.
What is Classification?
Understand classification in machine learning: binary vs multi-class vs multi-label tasks, common algorithms, probability outputs, decision thresholds, and real applications in clinical AI and LLM evaluation.
Reading a Confusion Matrix
Step-by-step guide to reading confusion matrices: binary and multi-class, row vs column orientation, normalization, identifying systematic errors, and what each quadrant reveals about model behavior.
The Confusion Matrix
Confusion matrix explained: reading TP/TN/FP/FN, computing all derived metrics, multi-class confusion matrices, interpreting class-level errors, and common visualization patterns.
Cross-Validation: When to Use It and Why
Master cross-validation: k-fold, stratified k-fold, leave-one-out, time-series cross-validation — when each is appropriate and how to use scikit-learn's cross_val_score for reliable model evaluation.
Data Drift and Concept Drift
Data drift and concept drift explained: definitions, how to detect each with statistical tests, practical monitoring code, and how to respond — retrain vs recalibrate vs update features.
Why Do We Split Data?
Understand why splitting data into train, validation, and test sets is essential: preventing data leakage, measuring generalization, enabling honest evaluation, and the critical time-split rule for temporal data.
Debugging: Model Not Learning
Systematic approach to diagnosing a model that won't learn: sanity checks, data issues, target leakage, learning rate problems, architecture mistakes, and a debug-first protocol for ML.
Systematic ML Debugging
A reproducible, step-by-step framework for debugging ML models: error taxonomy, the debugging ladder, tools for each layer, and a checklist for both development and production failures.
What is a Decision Boundary?
Understand decision boundaries in machine learning: the line (or surface) that separates predicted classes, how different algorithms draw different boundaries, and why non-linearity matters in real AI problems.
How to Detect Overfitting
Practical techniques to detect overfitting: training vs validation curves, learning curves, performance gap analysis, validation loss monitoring, and automated early warning checks for ML models.
The F1 Score
F1 score explained: formula, why harmonic mean penalizes imbalance, F-beta for asymmetric costs, macro vs micro vs weighted averaging, and when to use F1 vs other metrics.
What is Feature Engineering?
Feature engineering fundamentals: transforming raw data into model-ready inputs, types of feature engineering, domain-driven feature creation, and why it often matters more than model choice.
What is Feature Scaling?
Feature scaling explained: why raw feature magnitudes mislead distance-based and gradient-based models, what scaling does, and which algorithms require it.
Feature Selection
Feature selection methods: filter methods (correlation, mutual information), wrapper methods (RFE), embedded methods (L1 regularization, tree importance), and how to choose and validate feature selection.
What is a Feature and a Label?
Clear definitions of features and labels in machine learning: raw vs engineered features, target variables for regression and classification, and how they map to real AI use cases like drug prediction and clinical NLP.
How to Fix Overfitting: Dropout, Regularization, Data
Practical techniques for fixing overfitting: L1/L2 regularization, Dropout, early stopping, data augmentation, cross-validation, and ensemble methods — with code and trade-off analysis.
Grid Search for Hyperparameter Tuning
Grid search explained: exhaustive hyperparameter search with cross-validation, how to set up GridSearchCV, when it works and when it doesn't, and how to interpret results.
Hyperparameters vs Parameters
The distinction between model parameters (learned from data) and hyperparameters (set before training): examples, how each is optimized, and why this matters for model selection and evaluation.
L1 Regularization (Lasso)
L1 regularization explained: the absolute-value penalty, why it drives weights to exactly zero, feature selection effect, the Lasso path, and when to prefer L1 over L2.
L1 vs L2 Regularization
Side-by-side comparison of L1 and L2 regularization: formulas, sparsity, correlated features, geometric interpretation, Elastic Net, and a practical decision guide for when to use each.
L2 Regularization (Ridge)
L2 regularization explained: the squared-weight penalty, why it shrinks but never zeros weights, how it handles correlated features, coefficient interpretation after scaling, and Ridge regression examples.
Linear vs Logistic Regression
Understand the key differences between linear and logistic regression: output type, loss function, activation, decision boundary, and when to use each — with code and interview-ready explanations.
Why False Negatives Matter More in Clinical ML
Why false negatives are disproportionately dangerous in clinical ML: missed diagnoses, the asymmetry of medical errors, threshold selection for safety-critical systems, and how to design for recall.
Min-Max Scaling
Min-Max scaling in depth: formula, implementation, behavior with outliers, when to use it, and a clinical example showing how to apply it correctly in an ML pipeline.
Handling Missing Values in ML
Complete guide to missing data: MCAR/MAR/MNAR mechanisms, imputation strategies (mean, median, mode, model-based, KNN), when to use each, and how to avoid data leakage in imputation.
How Does a Model Actually Learn?
Understand the mechanics of model learning: loss functions, gradient descent, weight updates, the training loop, and why learning is fundamentally an optimization problem.
Normalization vs Standardization
Compare normalization (Min-Max) and standardization (Z-score): formulas, when to use each, how they handle outliers, and which to choose for different algorithms and data distributions.
What is Overfitting?
Understand overfitting: why models memorize training noise, how to detect it from learning curves, common causes, and the first-line fixes — with code examples and interview-ready explanations.
Precision and Recall
Precision and recall explained: formulas, the precision-recall tradeoff, how to compute them, when to prioritize each, and clinical examples where one matters more than the other.
Random Search for Hyperparameter Tuning
Random search for hyperparameter tuning: why it often outperforms grid search, how to configure RandomizedSearchCV, sampling distributions, and practical examples with budget control.
What is Regression?
Understand regression in machine learning: predicting continuous values, linear and polynomial regression, loss functions (MSE, MAE, RMSE), evaluation metrics, and real AI applications like dose prediction and outcome forecasting.
What is Regularization?
Regularization fundamentals: why models overfit, what regularization adds to the loss function, how it constrains model complexity, and the intuition behind the bias-variance tradeoff it controls.
What is Reinforcement Learning?
Understand reinforcement learning: agents, environments, rewards, policies, and the connection to RLHF in LLMs — with clear intuition for AI engineering interviews.
The ROC Curve
ROC curve explained: what it plots, how to read it, how to compute it, why AUC is a threshold-independent metric, and when the ROC curve can be misleading for imbalanced data.
What is Semi-Supervised Learning?
Understand semi-supervised learning: how a small amount of labeled data combined with large amounts of unlabeled data trains better models — with real examples in clinical NLP and drug classification.
Sensitivity and Specificity
Sensitivity (recall) and specificity: clinical definitions, formulas, the sensitivity-specificity tradeoff, Youden's J, and why medical tests prioritize sensitivity for screening and specificity for confirmation.
What is Supervised Learning?
A complete explanation of supervised learning: how labeled data trains models, the two main tasks (regression and classification), common algorithms, and real-world AI applications.
ML Terminology Quick-Reference for Interviews
A comprehensive ML vocabulary reference for AI engineering interviews: every term from features and loss to regularization, ensembles, and production concepts — with concise, interview-ready definitions.
The Test Set: One Shot, Final Score
Understand the test set's role: final, unbiased evaluation, why it must be used exactly once, test set contamination risks, and how to report honest model performance for AI systems.
How to Select a Classification Threshold
Systematic methods for selecting the classification threshold: F1-optimal, recall-constrained, precision-constrained, cost-sensitive, and Youden's J — with clinical examples and validation procedure.
TP, TN, FP, FN Explained
True positive, true negative, false positive, false negative: precise definitions, intuitions, how they relate to precision and recall, and worked clinical examples for each scenario.
Training, Validation, and Testing — What Each Does
Understand the three dataset splits in machine learning: training set for learning, validation set for tuning, and test set for final evaluation — with the critical rules that prevent data leakage.
The Training Set: What It Does and Doesn't Do
Understand the role of the training set: what the model learns from it, why high training accuracy is meaningless alone, and what the training set tells and doesn't tell you about real-world performance.
What is Underfitting?
Understand underfitting in machine learning: high bias, why models fail to learn, how to detect it, and the fixes — more complexity, better features, less regularization, more training.
What is Unsupervised Learning?
Understand unsupervised learning: clustering, dimensionality reduction, and anomaly detection — with practical examples using patient clustering, embedding visualization, and drug similarity search.
The Validation Set: Tuning Without Cheating
Understand the validation set's role in ML: hyperparameter tuning, model selection, early stopping, and the validation leakage problem — with practical code and interview-ready explanations.
What is Variance in Machine Learning?
Understand variance in ML: sensitivity to training data noise, high-variance models, overfitting connection, and how to measure and reduce variance with regularization, ensembles, and more data.
What is Machine Learning?
A clear, interview-ready definition of machine learning: learning from data instead of explicit rules, the three types of ML, and why ML is the foundation of modern AI systems.
Z-Score Standardization
Z-score standardization in depth: formula, implementation, why it works for gradient-based models, how to handle outliers, and a clinical example with correct pipeline usage.
Data Isolation Between Modules — Schema-per-Module Strategies
Enforce data isolation between modules in a modular monolith: schema-per-module in SQL Server, separate DbContext per module, preventing cross-schema queries, and managing module-specific migrations.
Inter-Module Communication — Contracts and Events
Enable communication between modules in a modular monolith: synchronous module APIs, in-process domain events, the Module Event Bus pattern, and preventing the distributed monolith trap through loose coupling.
Module Structure and Enforcing Boundaries in a Modular Monolith
Structure and enforce module boundaries in a modular monolith: folder conventions, namespace enforcement, module APIs, dependency analysis with NDepend or ArchUnitNET, and preventing cross-module coupling.
Shared Kernel — What Belongs Between Modules
Design the Shared Kernel in a modular monolith: what to include (Result, Error, module event contracts), what to exclude, and how to prevent the Shared Kernel from becoming a dumping ground.
Testing a Modular Monolith — Module-Level and Integration Tests
Test a modular monolith effectively: in-process module tests, cross-module integration tests with real databases, testing module boundaries, and validating architecture constraints with ArchUnitNET.
A/B Testing Prompts
How to run A/B tests on prompt versions in production — traffic splitting, measuring quality metrics, statistical significance, and gradual rollout strategies.
Building Prompt Evaluations
How to build an evaluation framework for LLM prompts — test sets, metrics, automated grading, and the evaluation-driven development workflow.
Context Injection
How to inject relevant context into prompts at runtime — RAG context, user state, tool outputs — and best practices for formatting, ordering, and managing context length.
Context Stuffing: Maximizing What the Model Knows
Techniques for packing the right information into a context window. Covers document selection, truncation strategies, context ordering, and the lost-in-the-middle problem.
Defence in Depth for LLM Applications
A layered security architecture for production LLM applications — input validation, prompt hardening, output filtering, minimal permissions, and monitoring.
Detecting Prompt Injection
Methods for detecting prompt injection attempts in production LLM systems — rule-based, embedding-based, LLM-as-classifier, and anomaly detection approaches.
Domain-Specific Prompting Patterns
Prompt engineering patterns for medical, legal, financial, and code domains. Each domain has distinct accuracy requirements, liability considerations, and output formats.
Eval-Driven Prompt Development
Build prompt engineering workflows around evaluation datasets. Measure prompt quality systematically, iterate with evidence, and catch regressions in CI.
Function Calling: LLMs as Orchestrators
Use OpenAI function calling to let LLMs invoke typed tools. Define function schemas, handle multi-turn tool use, parallelize calls, and build reliable tool-using agents.
Hard Rules and Constraints in Prompts
How to write hard constraints in system prompts, the priority ordering of instructions, what can and can't be enforced through prompting, and when to use output classifiers instead.
Prompt Injection Attacks
What prompt injection is, how it works, the main attack vectors against LLM applications, and why it's the most critical security threat for production AI systems.
Interview: Prompt Engineering (Part 1)
10 senior-level questions on prompt engineering fundamentals: chain-of-thought, few-shot learning, output format, system prompts, and reliability techniques.
Interview: Prompt Engineering (Part 2)
10 more senior-level questions on advanced prompting: function calling, structured output, evaluation, multimodal, cost optimization, and system design.
Prompt Engineering Interview Scenarios
Common prompt engineering interview scenarios and model answers — designing prompts for extraction, handling adversarial inputs, debugging failures, and production safety.
Jailbreaks and Model Manipulation
Common jailbreak techniques used to bypass LLM safety guardrails, why they sometimes work, and the distinction between jailbreaks and legitimate adversarial testing.
Getting Reliable JSON Output
Techniques for reliably extracting structured JSON from LLMs — prompt design, JSON mode, schema enforcement, and handling malformed output.
Meta-Prompting: Prompts That Generate Prompts
Use LLMs to generate, optimize, and critique prompts. Automate prompt engineering with meta-prompts that create specialist prompts, test cases, and evaluation criteria.
Multimodal Prompting: Vision and Images
Prompt LLMs with images, screenshots, and documents using vision APIs. Extract structured data from visual content, analyze charts, and process medical images.
Negative Prompting: What Not to Do
Use explicit negative instructions to prevent unwanted behaviors. Constrain outputs by describing what to avoid, when to refuse, and what format to reject.
Output Format Control
Constrain and shape LLM outputs with format instructions, examples, and schema definitions. Get consistent JSON, structured text, and typed responses every time.
Prompt Chaining: Decomposing Complex Tasks
Break complex tasks into sequential prompts where each output feeds the next. Build reliable pipelines with validation, branching, and error recovery between steps.
Prompt Injection Defense
Detect and prevent prompt injection attacks where user input attempts to override system instructions. Build defenses for LLM applications handling untrusted input.
Building a Prompt Library
Organize, version, and reuse prompts across your team. Build a prompt library with templates, variables, composition patterns, and a management system.
ReAct Prompting: Reason and Act
Combine reasoning and tool use with the ReAct pattern. Build agents that think before acting, observe results, and iterate to complete complex tasks.
Role Prompting: Persona and Expertise Framing
Use role prompting to prime models for specific expertise domains, communication styles, and reasoning patterns. Design effective personas for different deployment contexts.
Schema Definition in Prompts
How to communicate the desired output schema to an LLM — TypeScript-style schemas, JSON Schema, inline examples, and how schema clarity affects reliability.
Self-Consistency: Majority Voting for Reasoning
Sample multiple reasoning paths and select the most consistent answer. Self-consistency improves accuracy on complex reasoning tasks without requiring human labels.
Structured Output Interview Q&A
Common interview questions on getting structured outputs from LLMs — JSON extraction, schema enforcement, validation, and production reliability patterns.
Structured Output: Reliable JSON from LLMs
Guarantee parseable structured output from LLMs using JSON mode, Pydantic schemas, grammar-constrained generation, and validation with retry logic.
System Prompts: Setting Model Behavior
Design effective system prompts that shape model persona, constrain behavior, set output format, and establish context for your application.
Temperature and Sampling Parameters
Control LLM output diversity with temperature, top-k, top-p, and repetition penalties. Learn when to use deterministic vs stochastic sampling for different task types.
Tree of Thought Prompting
Use Tree of Thought (ToT) prompting to explore multiple reasoning paths simultaneously. Break complex problems into branches, evaluate each, and select the best solution.
Validation and Retry Loops
How to validate LLM outputs against schemas and business rules, and how to build retry loops that correct the model when it fails — with truncation, schema errors, and factual checks.
*args and **kwargs Explained
Master *args and **kwargs in Python: collecting variable positional and keyword arguments, unpacking in function calls, and real uses in AI frameworks like LangChain.
What are Python's built-in data types?
Master Python's built-in types: int, float, str, bool, list, tuple, dict, set, and None. Understand their behavior, memory model, and how they appear in AI and ML code.
Classes and Objects
Build Python classes for AI engineering: instance attributes, class attributes, methods, properties, encapsulation, and real-world patterns from LangChain and ML codebases.
Dataclasses: Clean Data Containers for AI
Use Python dataclasses to define structured data without boilerplate: auto-generated __init__, __repr__, __eq__, field defaults, frozen instances, and Pydantic comparison for AI applications.
Default Arguments and Keyword Arguments
Master Python default and keyword arguments: positional vs keyword calls, keyword-only parameters, the mutable default bug, argument ordering rules, and patterns used in LangChain and ML APIs.
Dictionary Comprehensions
Build and transform dicts with comprehensions: basic syntax, filtering, inverting mappings, grouping, and patterns used in AI data pipelines and LangChain metadata handling.
Dictionary Methods for AI Engineers
Master Python dictionary methods: get, setdefault, update, pop, items, keys, values, and merge patterns — with practical examples for caching, config management, and RAG metadata filtering.
What is a Dictionary?
Master Python dicts: creation, access, mutation, iteration, merging, comprehensions, defaultdict, Counter, and common patterns in AI/ML code.
What is dynamic typing?
Understand Python's dynamic type system: how variables hold references, how type() and isinstance() work, when dynamic typing helps and when it causes bugs, and how type hints add clarity.
What is the difference between == and is?
Understand Python's equality operator (==) vs identity operator (is): when to use each, common bugs with None checks, integer caching, and string interning.
Defining and Calling Functions
Master Python function syntax: def, return, docstrings, type hints, multiple return values, first-class functions, and patterns used throughout AI/ML codebases.
Generators and yield for Memory-Efficient AI
Understand Python generators: yield syntax, lazy evaluation, generator expressions, send/throw, and why streaming document processing and LLM token streaming both use generators.
Inheritance and Method Overriding
Understand Python inheritance: single and multiple inheritance, super(), method overriding, abstract base classes, and how LangChain uses inheritance for its Runnable and Tool hierarchies.
__init__ and self Explained
Understand __init__ as Python's constructor and self as the instance reference. Learn how object initialization works, common patterns, and how they appear in LangChain and ML class hierarchies.
int vs float in Python
Understand Python's int and float types: precision, representation, common pitfalls with floating-point arithmetic, and practical patterns for AI and ML code.
Lambda Functions and When to Use Them
Understand Python lambda functions: syntax, limitations, when to use them vs def, and practical applications in sorting, map/filter, and LangChain LCEL pipelines.
List Comprehensions
Write concise, readable list comprehensions in Python: basic syntax, filtering, nested comprehensions, when to use them vs loops, and patterns in AI/ML data processing.
List Methods for AI Engineers
Master Python list methods: append, extend, insert, remove, pop, sort, reverse, index, count, and copy — with real patterns for managing document batches, scores, and AI pipeline queues.
What is the difference between List and Tuple?
Compare Python lists and tuples: mutability, memory, hashability, unpacking, use cases in AI/ML code, and when to choose each.
Magic Methods: __str__, __len__, __eq__
Master Python magic methods (dunder methods): __str__, __repr__, __len__, __eq__, __hash__, __contains__, __iter__, and how they make custom classes feel native.
map(), filter(), and zip() in Practice
Master Python's built-in map(), filter(), and zip() functions. Understand when to use them vs list comprehensions, and practical patterns for AI/ML data preprocessing.
What is Mutability?
Understand Python mutability: which types are mutable vs immutable, why it matters for function arguments and shared state, and how to safely copy objects in AI/ML code.
NumPy Arrays vs Python Lists
Understand why NumPy arrays are fundamental to AI/ML: dtype, shape, memory layout, vectorized operations, and performance comparison with Python lists.
Broadcasting: NumPy's Superpower
Understand NumPy broadcasting: how arrays of different shapes operate together, the broadcasting rules, common patterns for embeddings normalization and similarity computation, and pitfalls to avoid.
NumPy Math Operations for ML
Master NumPy mathematical operations for machine learning: linear algebra, matrix multiplication, statistical functions, random generation, and ML-specific patterns like dot products and eigenvectors.
NumPy Slicing and Indexing
Master NumPy array indexing: basic slicing, multi-dimensional indexing, boolean masking, fancy indexing, and common patterns in ML data preprocessing.
Scope and the LEGB Rule
Understand Python variable scope: Local, Enclosing, Global, and Built-in lookup order. Covers closures, nonlocal/global keywords, common bugs, and patterns used in LangChain callbacks and AI pipelines.
What is a Set and when should we use it?
Master Python sets: O(1) membership testing, set operations (union, intersection, difference), frozenset, and practical use cases in AI pipelines for deduplication and fast lookup.
What is Type Casting?
Understand Python type casting: explicit conversion between int, float, str, bool, list, and tuple — with safe patterns, common pitfalls, and AI/ML use cases like parsing API responses and preparing data for NumPy.
What is Python? Why is it widely used in AI?
Understand why Python became the dominant language for AI and ML: syntax simplicity, the scientific ecosystem, community size, and how it connects to C-speed libraries under the hood.
RAG Caching: Semantic and Exact-Match Strategies
Reduce latency and cost in RAG systems with semantic caching, exact-match Redis caching, TTL strategies, GPTCache, and cache invalidation patterns.
RAG Citations and Source Attribution
Attribute answers to source documents in RAG systems. Inline citations, span-level grounding, citation verification, and trust indicators for clinical AI.
Conversational RAG: Multi-Turn Dialogue
Build RAG systems that maintain conversation history, resolve coreference, rewrite follow-up queries, and manage context across multi-turn clinical dialogues.
RAG Cost Optimization
Reduce RAG system costs with model routing, caching strategies, embedding cost reduction, chunking optimization, and batch processing. Build cost-efficient clinical AI.
Embedding Models for RAG
How to choose and use embedding models for retrieval-augmented generation. OpenAI ada-002 vs text-embedding-3, open-source alternatives, fine-tuning for domain-specific retrieval.
Embeddings — Turning Text into Vectors
Understand and generate text embeddings in .NET: what embeddings are, generating embeddings with Azure OpenAI and Semantic Kernel, batching for efficiency, embedding clinical documents, and choosing embedding models.
RAG Evaluation — Measuring Retrieval and Answer Quality
Evaluate RAG systems in .NET: retrieval metrics (precision, recall, MRR), answer quality metrics (faithfulness, relevance, groundedness), building evaluation datasets, automated testing with LLM-as-judge, and clinical safety evaluation.
Hybrid Search in RAG
Combine dense (embedding) and sparse (BM25) retrieval for better RAG results. Reciprocal Rank Fusion, weighted combination, and when hybrid beats pure semantic search.
RAG Ingestion Pipeline
Build a production document ingestion pipeline: loading, parsing, chunking, embedding, and indexing. Handle updates, deletions, and incremental ingestion at scale.
Metadata Filtering in RAG
Filter retrieved documents by metadata before or after vector search. Pre-filtering, post-filtering, and combining semantic similarity with structured data constraints.
PDF Parsing for RAG
Extract clean, structured text from PDFs for RAG ingestion. Handle tables, multi-column layouts, headers, footers, and scanned documents with OCR.
Query Rewriting and Expansion in RAG
Improve RAG retrieval quality by transforming user queries before search. HyDE, multi-query generation, query decomposition, and step-back prompting.
Reranking Retrieved Documents
Improve RAG precision with reranking: cross-encoders, Cohere Rerank, LLM-as-judge reranking, and when the two-stage retrieval pipeline outperforms direct search.
RAG Retrieval — Finding and Injecting Context into the AI
Build RAG retrieval pipelines in .NET: query embedding, similarity search, context assembly, prompt injection of retrieved documents, re-ranking, hybrid retrieval, and hallucination prevention for clinical AI.
RAG Troubleshooting Guide
Diagnose and fix common RAG failures: poor retrieval, hallucinations, irrelevant answers, slow performance, and context window issues. A systematic debugging guide.
Vector Search — Finding Relevant Documents by Meaning
Implement vector search in .NET RAG systems: SQL Server vector search, pgvector on PostgreSQL, Azure AI Search, similarity metrics (cosine vs dot product), filtering, and performance tuning for clinical document retrieval.
Vector Stores for RAG
Compare vector databases for RAG: Chroma, Pinecone, Weaviate, Qdrant, pgvector. When to use each, indexing options, and production deployment patterns.
What Can Go Wrong in RAG
The main failure modes in RAG systems — from retrieval misses to faithful hallucination — and practical mitigations for each.
Constitutional AI
Anthropic's approach to AI alignment: a constitution of principles guides the model to critique and revise its own outputs, reducing reliance on human labeling.
Content Moderation APIs
When and how to use OpenAI Moderation, Azure Content Safety, and AWS Comprehend for AI output screening. Includes Python integration examples and a cost/latency comparison.
Defense in Depth for AI Systems
Layer five safety controls to protect your LLM application: input filtering, system prompt hardening, output classification, human review, and audit logging — with Python examples.
DPO: Direct Preference Optimization
DPO aligns LLMs with human preferences without a separate reward model or RL training loop. Learn how it works, when to use it over RLHF, and its practical limitations.
Interview: AI Safety and Guardrails Questions
12 Q&A pairs covering hallucinations, jailbreaks, prompt injection, alignment, defense in depth, and content moderation for AI engineering interviews.
Building Output Safety Classifiers
Build a binary and multi-label safety classifier to screen LLM outputs before they reach users. Covers threshold tuning, Pydantic integration, and performance vs cost trade-offs.
Rate Limiting and Abuse Prevention
Implement token-bucket rate limiting for AI APIs to control costs and prevent abuse. Redis-backed sliding window limiter, per-user and per-IP limits, with graduated response.
RLHF: Reinforcement Learning from Human Feedback
How RLHF works to align LLM behavior with human preferences — the three-stage process of SFT, reward model training, and PPO, with practical implications for AI safety.
System Design: Pharmaceutical Chatbot
Full system design answer for designing a pharmaceutical information chatbot — components, data flow, scalability, cost, latency, and what to cut for MVP.
System Design: AI Code Review Assistant
Design an AI code review tool that automatically reviews pull requests — from GitHub webhook to LLM reviewer to posted comments. Covers diff parsing, chunking large PRs, and quality control.
System Design: Document Q&A Platform
Design a document Q&A system where users ask questions over uploaded PDF reports. Covers ingestion pipeline, multi-document retrieval, permission model, and scale to 100,000 documents.
Scenario: Model Generates Harmful Medical Advice
Your pharmaceutical chatbot tells users to self-medicate with dangerous drug combinations. Learn how to diagnose the root cause and implement multi-layer safety controls.
SignalR Authentication and Authorization
Secure SignalR hubs with JWT authentication: token delivery via query string, hub-level and method-level authorization, the WebSocket JWT challenge problem, and production auth patterns.
SignalR Groups and Connection Management
Manage SignalR connections and groups: adding/removing from groups, user-based routing, connection tracking, broadcasting to subsets of clients, and patterns for ward-based clinical subscriptions.
Hub Methods — Calling Between Clients and Server in SignalR
SignalR hub methods in depth: strongly-typed hubs, calling clients from the server, calling the server from clients, hub context injection, and the invocation patterns for real-time clinical dashboards.
SignalR JavaScript Client — Connecting, Reconnecting, and Handling Events
Use the @microsoft/signalr JavaScript client: connection lifecycle, automatic reconnection, invoking hub methods, handling disconnects, and the patterns for a resilient clinical dashboard frontend.
SignalR Production Patterns — Scale, Reliability, and Monitoring
Production SignalR: connection lifecycle management, heartbeats, fallback transports, monitoring connection counts, graceful shutdown, and the operational patterns for real-time systems at hospital scale.
SignalR Redis Backplane — Scaling Real-Time to Multiple Instances
Scale SignalR across multiple API instances with a Redis backplane: how the backplane works, setup, sticky sessions vs backplane, monitoring backplane health, and the production patterns for high-availability real-time.
SignalR Streaming — Real-Time Data Feeds
Server-to-client and client-to-server streaming in SignalR: IAsyncEnumerable for server streaming, ChannelReader for channel-based streaming, client streaming patterns, and production use cases.
Dependency Inversion Principle — Depend on Abstractions
Apply the Dependency Inversion Principle in C#: high-level modules depending on interfaces, DI container wiring, avoiding the new keyword for dependencies, and the difference between DIP and dependency injection.
SOLID in Real .NET Projects — Violations and Fixes
Apply all five SOLID principles together in a real .NET project: recognizing violations in existing code, refactoring to SOLID step by step, the cost-benefit analysis of SOLID, and when NOT to apply a principle.
Interface Segregation Principle — Lean Interfaces
Apply ISP in C#: splitting fat interfaces into focused ones, role interfaces for test doubles, identifying ISP violations via NotImplementedException and empty methods, and the connection between ISP and LSP.
Liskov Substitution Principle — Subtype Contracts
Apply the Liskov Substitution Principle in C#: what subtypes must guarantee, classic LSP violations (square-rectangle), precondition weakening and postcondition strengthening, and LSP in interface design.
Open/Closed Principle — Extension Without Modification
Apply the Open/Closed Principle in C#: designing for extension with interfaces and composition, the strategy pattern as OCP in action, extension points for reporting and notification logic, and what OCP is not.
Architecture Decision Records — Documenting the Why
Use Architecture Decision Records (ADRs) to document key technical decisions, the context behind them, the options considered, and the consequences — so future engineers understand why the system is built the way it is.
C4 Model — Communicating Architecture at the Right Level
Use the C4 model to communicate software architecture: System Context, Container, Component, and Code diagrams — when to use each level, how to draw them, and which tools work best for .NET teams.
Selecting Architecture Patterns — Matching Patterns to Problems
Match architecture patterns to real problems: layered architecture, vertical slice, modular monolith, microservices, event-driven, and CQRS — when each applies and which forces drive the choice.
Capturing Requirements as Architecture Drivers
Translate stakeholder needs into architecture drivers: functional requirements, quality attributes (NFRs), constraints, and how they directly shape technology and structural decisions in .NET systems.
Architecture Trade-offs — There Are No Perfect Decisions
Analyse architectural trade-offs systematically: consistency vs availability, coupling vs autonomy, simplicity vs flexibility — and how to make defensible decisions on a clinical .NET platform.
ALiBi: Attention with Linear Biases
How ALiBi adds a static linear penalty to attention scores based on distance, why it extrapolates to longer sequences at inference, and how it compares to RoPE.
Decoder-Only Models (GPT-Style)
How decoder-only transformers work, why causal masking enables autoregressive generation, how GPT differs from BERT, and when to choose decoder-only architectures.
The Transformer Decoder Block
What makes the decoder different from the encoder: masked self-attention, cross-attention, causal masking, and the autoregressive generation process.
Encoder-Decoder Models (T5-Style)
How full encoder-decoder transformers work, why they suit seq2seq tasks, how cross-attention connects the two halves, and when to choose them over encoder-only or decoder-only.
Encoder-Only Models (BERT-Style)
What encoder-only transformers are, why bidirectional context makes them powerful for understanding, masked language modelling, and when to choose them over decoder-only models.
The Transformer Encoder Block
What an encoder block contains, how multi-head self-attention and feed-forward layers combine, the role of residual connections and layer norm, and what the encoder outputs.
Feed-Forward Networks in Transformers
The role of the position-wise FFN in each transformer block, the expand-and-contract design, activation functions, SwiGLU, and why FFN parameters dominate model size.
Interview Q&A: Attention Mechanism
Common interview questions and model answers about attention — the mechanism, scaling, multi-head, KV cache, and complexity — framed for senior ML and systems engineering roles.
Interview Q&A: Encoder, Decoder, and Architecture Variants
Common interview questions on encoder vs decoder blocks, encoder-only vs decoder-only vs encoder-decoder models, and when to choose each architecture.
Interview Q&A: Attention Heads and Scaling
Common interview questions on multi-head attention design choices, head pruning, grouped-query attention, and how scaling affects head count and model capacity.
Interview Q&A: Positional Encoding
Common interview questions on why transformers need positional encoding, sinusoidal vs learned vs RoPE vs ALiBi, and long-context challenges.
Layer Normalisation in Transformers
What layer norm does, how it differs from batch norm, why it's used in transformers, Pre-LN vs Post-LN, and RMSNorm used in LLaMA.
Learned Positional Embeddings
How BERT and GPT-2 learn position embeddings from data, the trade-offs vs sinusoidal encodings, and why learned embeddings dominate in practice despite their length limitation.
Multi-Head Attention
Why multi-head attention uses parallel heads, how heads are split and concatenated, what different heads learn, and the full architecture with code.
Query, Key, and Value Matrices
What Q, K, and V are in attention: how they're computed from input embeddings, what each represents conceptually, and why this decomposition works.
Residual Connections
Why residual (skip) connections are essential for deep transformers, how they solve the vanishing gradient problem, and what the identity shortcut provides architecturally.
Rotary Positional Encoding (RoPE)
How RoPE encodes position by rotating query and key vectors, why relative distance falls out naturally, and why it's become the standard in LLaMA and Mistral.
Scaled Dot-Product Attention
The complete attention computation: dot products, scaling, masking, softmax, and value aggregation. Step-by-step with shapes and code.
Sinusoidal Positional Encoding
How the original Transformer injects position with sine and cosine functions, why that design encodes relative distance, and what its limitations are.
Softmax and Temperature in Attention
How softmax converts attention scores to weights, what temperature does to the distribution, and how sharp vs flat attention affects model behaviour.
Transformer Training Objectives
The three main pretraining objectives — causal LM, masked LM, and seq2seq — how they differ, what tasks they suit, and how they translate to loss functions.
What Is Attention?
The attention mechanism explained: why it was invented, what it computes, how it differs from RNNs, and the core intuition for understanding transformers.
Why Positional Encoding?
Why transformers are position-agnostic by default, what breaks without positional information, and the design space for injecting position into attention-based models.
Writing Your First Test — Red, Green, Refactor
Start test-driven development in .NET: the Red-Green-Refactor cycle, xUnit test anatomy, writing the first failing test for a clinical domain rule, and making it pass with minimal code.
TDD with Legacy Code — Adding Tests to Untested Systems
Apply TDD techniques to legacy .NET code: characterisation tests, seam identification, dependency injection for testability, the Strangler Fig pattern, and safely adding behaviour to untested clinical systems.
Outside-In TDD — Start from the API, Drive Down to the Domain
Apply outside-in (London School) TDD in .NET: start with a failing acceptance test at the API level, mock collaborators, drive the design downward through handlers to the domain, and finish with unit tests at each layer.
TDD Pitfalls — Common Mistakes and How to Avoid Them
Avoid the most common TDD antipatterns in .NET: testing implementation details, brittle mocks, over-mocking, slow test suites, and the false confidence of low-value tests.
Refactoring Under Test — Changing Code Without Changing Behaviour
Refactor safely in .NET using TDD: extract method, replace conditional with polymorphism, introduce value objects, and use the test suite as a safety net throughout — with clinical domain examples.
Interfaces and Dependency Injection — Making Code Testable
Design testable .NET code using interfaces and dependency injection: injecting dependencies instead of creating them, avoiding new-ing up collaborators, and the difference between DI as a tool and testability as the goal.
Avoiding Static State — Why Static Kills Testability
Understand how static state and static methods undermine testability in .NET: hidden dependencies, shared mutable state, static service locators, and how to replace them with injectable alternatives.
Pure Functions — The Most Testable Code You Can Write
Design .NET code as pure functions for maximum testability: referential transparency, side-effect-free computation, extracting pure logic from impure orchestration, and clinical domain examples.
Testing Time-Dependent Code — Clock Injection and Deterministic Tests
Make time-dependent .NET code testable: inject IClock instead of using DateTime.UtcNow, freeze time in tests, test expiry logic, scheduled jobs, and audit timestamps with full control over the clock.
Interview: Transformer Architecture (Part 2)
10 more senior-level questions: KV cache, quantization, speculative decoding, scaling laws, MoE, and system design with transformer-based models.
Interview: Transformer Architecture (Part 1)
10 senior-level questions on transformer internals: attention mechanics, positional encodings, normalization, and architectural design choices.
BERT vs GPT: Encoder vs Decoder Architectures
Compare BERT's bidirectional encoder and GPT's causal decoder. Understand masked language modeling vs next-token prediction, and which architecture fits which task.
Context Window: Limits, Tradeoffs, and Extensions
Why context windows are limited, the quadratic attention bottleneck, how modern models extend context, and practical strategies for working within limits.
Embeddings: Token and Positional Representations
How transformers convert token IDs into dense vectors. Token embeddings, positional encodings (sinusoidal and learned), and how they combine to form the model's input.
Feed-Forward Networks in Transformers
Understand the position-wise feed-forward network (FFN) in transformer layers: its role, architecture, activation functions, and how it differs from attention.
Flash Attention: IO-Aware Attention Algorithm
How Flash Attention reformulates self-attention to minimize GPU memory I/O, enabling 2-4x speedups and linear memory scaling for long sequences.
Instruction Tuning: From Predictor to Assistant
How supervised fine-tuning (SFT) on instruction-response pairs transforms a pretrained language model into an assistant that follows directions and completes tasks.
KV Cache: Accelerating Autoregressive Inference
How the key-value cache eliminates redundant attention computation during text generation. Understand cache structure, memory cost, and when caching breaks down.
LLaMA Architecture: Modern Decoder Design
How LLaMA and its derivatives (Mistral, Qwen, Phi) improve on the original transformer: RoPE, RMSNorm, SwiGLU, GQA, and grouped query attention.
Mixture of Experts: Sparse Scaling
How Mixture of Experts (MoE) scales model capacity without proportionally scaling compute. Covers router mechanisms, load balancing, expert collapse, and models like Mixtral.
Pretraining: How LLMs Learn from Raw Text
The next-token prediction objective, training data curation, curriculum design, and what a model actually learns during pretraining on trillions of tokens.
Quantization: Compressing Model Weights
How quantization reduces LLM memory and speeds up inference by representing weights in fewer bits. Covers INT8, INT4, GPTQ, AWQ, and bitsandbytes QLoRA.
RoPE and ALiBi: Relative Position Encodings
How Rotary Position Embeddings (RoPE) and Attention with Linear Biases (ALiBi) encode relative position, enabling length generalization beyond training context.
Scaling Laws: Predicting Model Performance
How Chinchilla and OpenAI scaling laws relate model parameters, training tokens, and compute budget to loss. Use scaling laws to make optimal training decisions.
Speculative Decoding: Faster Inference
How speculative decoding uses a small draft model to propose tokens that a large model verifies in parallel, achieving 2-3x speedups with identical output distribution.
Tokenization: From Text to Tokens
How tokenizers convert raw text into token IDs that transformers consume. Covers BPE, WordPiece, SentencePiece, vocabulary design, and tokenizer gotchas.
Building a Feature Slice — End to End
Build a complete vertical slice from endpoint to database: command, validator, handler, domain logic, persistence, and response — a full CreatePrescription feature as a worked example.
Vertical Slice Folder Structure — Organizing by Feature
Structure a Vertical Slice Architecture project by feature: co-locating command, handler, validator, and endpoint in one folder, shared kernel placement, and how to scale the structure as features grow.
MediatR in Vertical Slice Architecture — Commands, Queries, and Pipeline Behaviors
Use MediatR as the backbone of Vertical Slice Architecture: IRequest, IRequestHandler, pipeline behaviors for cross-cutting concerns, notifications for domain events, and registering MediatR in ASP.NET Core.
Shared Kernel in Vertical Slice — What to Share and What Not To
Design the Shared Kernel in Vertical Slice Architecture: Result type, Error type, MediatR behaviors, domain primitives, what belongs there versus in feature folders, and avoiding the SharedKernel dumping-ground anti-pattern.
Testing Vertical Slices — Handler Tests, Integration Tests, and Test Isolation
Test Vertical Slice features effectively: unit-testing handlers in isolation, integration testing with WebApplicationFactory, test data builders for domain objects, and the testing strategy that matches the architecture.
Vertical Slice vs Clean Architecture — Choosing the Right Approach
Compare Vertical Slice Architecture and Clean Architecture: organizational model, coupling patterns, team fit, scalability, when each excels, and how to choose between them for your project context.
AssistantAgent vs UserProxyAgent
Deep dive into AutoGen's two core agent types: how AssistantAgent generates responses and how UserProxyAgent executes code and manages human input.
Interview: AutoGen vs LangGraph — When Would You Choose?
A structured Q&A covering 8 senior-level interview questions: AutoGen internals, code execution risks, agent loops, testing strategies, production limitations, and multi-agent system design.
GroupChatManager: Selecting the Next Speaker
How GroupChatManager orchestrates multi-agent conversations, speaker selection strategies including custom routing functions, and a domain-routing medical specialist example.
Code Execution: Agents That Write and Run Code
AutoGen's code generation and execution pipeline: LocalCommandLineCodeExecutor vs DockerCommandLineCodeExecutor, security implications, and a real data analysis example.
Conversation-First Architecture
Why AutoGen uses conversations as the primary primitive, how conversation history tracks state, and how this compares to LangGraph's state-based approach.
Registering Functions as Agent Tools
How to register Python functions as tools agents can call, using AutoGen's decorator-based tool registration with real stock price and database query examples.
GroupChat: Multiple Agents in One Conversation
Using AutoGen's GroupChat class for 3+ agents, speaker ordering, and a real researcher-coder-reviewer workflow with complete code and conversation history access.
Human Input Mode: When to Ask the User
The three human_input_mode options — NEVER, TERMINATE, ALWAYS — when each is appropriate, how to set max_turns, and designing workflows with human approval checkpoints.
Termination Conditions
How AutoGen conversations end: the TERMINATE keyword, max_turns, custom is_termination_msg functions, timeout handling, and best practices for production systems.
Two-Agent Chat: Hello, AutoGen
A complete working AutoGen example with AssistantAgent and UserProxyAgent, including real task execution, conversation output, and code execution results.
What is AutoGen?
AutoGen's conversation-centric approach to multi-agent AI, how it differs from LangChain, the two core agent types, and a minimal working example.
Agent Memory Types
Understand the four memory types available to agents — in-context, episodic, semantic, and procedural — and learn when to use each one.
Managing Context Window in Agents
Keep long-running agents effective as message history grows — using rolling windows, hierarchical summarization, and selective memory strategies with token budget tracking.
Interview: Agent Memory and Context Questions
Ten Q&A pairs covering agent memory types, context window strategies, and state persistence — the questions interviewers actually ask for agentic AI engineering roles.
Plan-and-Execute Pattern
Separate planning from execution to build agents that can parallelize independent steps and reason more clearly about complex multi-step tasks.
The ReAct Pattern
Implement the Reasoning + Acting pattern from scratch using the raw OpenAI API — no frameworks — with a drug information agent as a worked example.
Self-Reflection Pattern
Build agents that evaluate and improve their own outputs through a generator-critic-refiner loop — essential for high-stakes domains like medical, legal, and code generation.
Supervisor-Worker Multi-Agent Pattern
Build multi-agent systems where a supervisor delegates tasks to specialist worker agents — enabling parallelism, specialization, and cleaner separation of concerns.
Tool Use in Agentic Systems
Build a robust tool registry for your agents — including dynamic tool selection, tool composition, and a worked example with web search, calculator, and database lookup tools.
What Is Agentic AI?
Understand what makes an AI system agentic — perception, decision-making, and action in a loop — and when to use agents versus simpler retrieval approaches.
Building a Custom Tool End-to-End
Full walkthrough of building a production-ready custom tool: schema design, implementation, input validation, structured output, FastAPI integration, and testing.
How the LLM Decides Which Tool to Call
Understand the mechanism behind tool selection — how descriptions, context, and tool_choice settings influence which function gets called and when.
Least Privilege for Tool Access
Apply the principle of least privilege to AI agent tools — scoped DB users, per-tool API keys, role-based tool sets, and runtime access control in Python.
Parallel Tool Calls
When the LLM requests multiple tools in one response, run them concurrently with asyncio.gather() to cut latency. Learn the complete pattern with real examples.
Handling Tool Errors Gracefully
Tools fail. Learn how to catch exceptions, return structured error results the LLM can reason about, implement retry logic, and build resilient agent loops.
Observability for Tool Calls
What to log, how to trace tool call chains with OpenTelemetry, which metrics to collect, and how to alert on tool anomalies in production AI agents.
Returning Tool Results to the LLM
Master the message flow for feeding tool results back to the LLM — correct role, format, ID matching, large result handling, and the full execution loop.
Defining Tool Schemas in JSON
Learn how to write precise JSON Schema definitions for LLM tools. Clear schemas are the single biggest factor in whether the model calls your tool correctly.
Tool Security: Attack Vectors
Understand the real attack vectors in tool-calling agents — prompt injection, confused deputy, data exfiltration, indirect injection — and how to detect them.
Validating Tool Inputs and Outputs
LLMs can hallucinate invalid arguments. Learn to validate tool inputs with Pydantic, validate outputs against expected schemas, and re-prompt on failure.
Interview: Tool Calling Scenario Questions
12 realistic interview Q&A pairs covering tool schema design, parallel calls, error handling, security, validation, and system design for tool-calling agents.
What Is Tool Calling?
Understand how LLMs decide to invoke functions instead of generating text, and why tool calling is the foundation of every useful AI agent.
Defining Agents in CrewAI
A complete guide to the Agent class in CrewAI — every constructor parameter explained with real examples, including a multi-agent pharmaceutical content pipeline.
Interview: CrewAI Agent Design Questions
Eight interview-style Q&A pairs on CrewAI agent design — role vs goal vs backstory, tool assignment, memory, delegation, and multi-agent architecture decisions.
Agent Memory in CrewAI
How CrewAI's memory system works — short-term, long-term, and entity memory — when to enable it, what it costs, and how to configure it for production use.
Giving Agents Tools
How to equip CrewAI agents with built-in tools, custom tools using the @tool decorator, and structured Pydantic input schemas — with examples including database search and web search.
Core Concepts: Agent, Task, Crew, Process
A deep dive into the four fundamental building blocks of every CrewAI system — Agent, Task, Crew, and Process — with complete annotated examples.
Installing and Configuring CrewAI
Step-by-step guide to installing CrewAI, configuring API keys for OpenAI and Azure, setting up a project structure, and running your first crew.
Defining Tasks in CrewAI
A complete guide to the Task class in CrewAI — every constructor parameter explained, with emphasis on writing effective expected_output definitions and a full research-to-writing pipeline example.
CrewAI vs LangChain Agents
Where CrewAI fits in the AI agent landscape compared to LangChain LCEL, LangGraph, and AutoGen — with side-by-side code showing the same task in each framework.
What Is CrewAI?
An introduction to CrewAI: the framework for orchestrating multiple AI agents as a crew, with role-based agents, task assignments, and sequential or hierarchical workflows.
BLEU Score for Text Generation
Learn how BLEU score works, what it measures, when to use it, and why it fails for many modern NLP tasks.
Building a Golden Dataset
Learn how to create a high-quality golden dataset of prompt/response pairs for LLM evaluation — the foundation of any reliable automated eval system.
Human Evaluation vs Automated Evaluation
When to use human evaluators, when to use automated metrics, and how to combine both for reliable, scalable LLM quality assurance.
Perplexity as a Language Model Metric
Understand what perplexity measures, how to compute it, and when it is — and isn't — a useful signal for evaluating language models.
ROUGE Score for Summarization
Learn how ROUGE-N, ROUGE-L, and ROUGE-S work, when to use them, and how to implement summarization evaluation with the rouge-score library.
Evaluation by Task Type
Match the right evaluation metric to the right LLM task: classification, generation, RAG, code, and conversation each demand a different approach.
Why Evaluating LLMs Is Hard
Understand the fundamental challenges of LLM evaluation: non-determinism, no single ground truth, task diversity, and why traditional ML metrics fall short.
Async/Await in FastAPI
Master Python's async/await model for FastAPI routes. Learn when to use async def vs def, how to await OpenAI calls, run parallel tasks with asyncio.gather, and safely offload blocking code.
Background Tasks in FastAPI
Use FastAPI's BackgroundTasks to fire-and-forget work after the response is sent. Covers audit logging, email notifications, cache invalidation, and when to reach for Celery or Azure Service Bus instead.
Deploying FastAPI to Azure Container Apps
Deploy a FastAPI AI service to Azure Container Apps with containerapp.yaml, Key Vault secret references, health probes, scaling rules, and az CLI deployment commands.
Dependency Injection in FastAPI
Master FastAPI's Depends() system. Build dependency chains for auth, DB sessions, and OpenAI clients. Override dependencies cleanly in tests. Includes real async DI examples.
Dockerising a FastAPI AI Service
Write a production Dockerfile for FastAPI with multi-stage builds, non-root user, uvicorn configuration, .dockerignore, and environment variable injection for AI services.
Health Check Endpoints in FastAPI
Build production-grade liveness, readiness, and startup probes for FastAPI AI services. Covers dependency checks with timeouts, 503 responses, and Kubernetes/Azure Container Apps probe configuration.
Interview: FastAPI and Async Python Questions
12 interview Q&A pairs covering FastAPI and async Python for AI engineering roles. Topics include async vs sync, Pydantic validation, streaming, dependency injection, health checks, and a RAG system design question.
Application Lifespan: Startup and Shutdown
Use FastAPI's asynccontextmanager lifespan pattern to initialise DB pools, Redis, and embedding models at startup, then clean up on shutdown. Covers app.state for resource sharing.
Path, Query, and Body Parameters
Master how FastAPI routes path parameters, query strings, JSON bodies, and headers. Includes complete CRUD examples for a drug information API with parameter validation.
Pydantic v2 Request and Response Models
Learn how Pydantic v2 powers FastAPI's validation, serialization, and OpenAPI generation. Covers BaseModel, Field, model_validator, field_validator, nested models, and custom validators for AI service payloads.
Server-Sent Events for LLM Streaming
Stream LLM tokens to the browser using FastAPI's StreamingResponse and Python AsyncGenerator. Covers SSE format, Azure OpenAI streaming, JavaScript consumption, and mid-stream error handling.
Why FastAPI for AI Services
Understand what FastAPI is, why it suits AI and LLM workloads, and how it compares to Flask and Django REST Framework. Build your first endpoint and run it with uvicorn.
Catastrophic Forgetting
Learn what catastrophic forgetting is, why it happens during fine-tuning, how to detect it with benchmarks, and how to prevent it using LoRA, replay data, and careful hyperparameters.
Full Fine-Tuning vs PEFT
Compare full fine-tuning against Parameter-Efficient Fine-Tuning methods — LoRA, QLoRA, adapters, and prefix tuning — and understand when each approach is appropriate.
LoRA Explained
A deep dive into Low-Rank Adaptation — the math behind it, what rank and alpha control, which layers to target, and a full working example with the PEFT library on Llama 3.
QLoRA: Fine-Tuning on Consumer Hardware
Learn how QLoRA combines 4-bit NF4 quantization with LoRA adapters to enable fine-tuning of massive models on a single GPU — with a complete working example.
What Is Fine-Tuning?
Understand fine-tuning at the conceptual level — what it changes, what it costs, and how it fits into the LLM adaptation toolkit alongside prompting and RAG.
When to Fine-Tune vs Prompt Engineer
A practical decision framework for choosing between prompting, RAG, and fine-tuning — with a real pharmaceutical case study showing when fine-tuning wins.
Arrays and Hash Maps in AI Interview Problems
Two Sum, sliding window, and frequency counting with AI context: counting token frequencies, deduplicating documents, and finding the k most frequent tokens in a corpus.
Implement BPE Tokenization
Byte Pair Encoding step by step: initialize character vocabulary, merge most frequent pairs iteratively, apply merges to new text, and complete Python implementation.
Time and Space Complexity for AI Engineers
Big O notation review with AI-specific examples: O(n) embedding lookup, O(n²) attention, chunking pipelines, and when complexity actually matters in production RAG systems.
Heaps for Top-K Retrieval
Min-heap and max-heap operations for AI systems: top-k most similar embeddings without sorting all results, heapq module, and complete implementations for vector search.
Sliding Window for Token Processing
Fixed-size and variable-size sliding windows for AI problems: chunking text with overlap, context window management, and implementing production-quality text chunkers.
Implement a Basic Tokenizer
Build a word tokenizer from scratch: whitespace splitting, vocabulary building, encoding and decoding, OOV handling, and comparison with HuggingFace tokenizer output.
Chat Models in LangChain
ChatOpenAI, ChatAnthropic, AzureChatOpenAI, model parameters, invoke() vs stream() vs batch(), and HumanMessage/AIMessage/SystemMessage in depth.
What Is LangChain?
Framework overview, core abstractions (LLMs, chains, agents, memory, tools), when to use vs raw API, installation, and your first hello-world chain.
Writing Node Functions
Master the LangGraph node function signature — reading state, returning partial updates, calling LLMs inside nodes, error handling, and complete real-world examples.
Conditional Edges and Routing
Build intelligent routing logic in LangGraph using conditional edges — router functions, mapping return values to nodes, routing to END, and a full medical query routing example.
Graphs, Nodes, and Edges
Understand the directed graph model at the heart of LangGraph — nodes as functions, edges as transitions, and how execution flows from START to END.
Defining State Schema
Learn how to design the state that flows through your LangGraph — TypedDict schemas, what to store, immutability rules, and a real drug-information agent state.
Creating a StateGraph
Walk through every step of building a LangGraph StateGraph — initialization, registering nodes, connecting edges, and compiling to a runnable Pregel graph.
Why LangGraph?
Understand what LangGraph adds over LangChain's linear chains, when to reach for it, and how its graph-based control flow enables true agentic systems.
Setting Up Alerts: Rate Limits, Latency Spikes
Configure Azure Monitor alerts that wake you up before users complain. Learn the right thresholds for LLM latency, error rate, token cost, and rate limit alerts.
Azure Monitor and Application Insights for LLMs
Set up Azure Monitor and Application Insights to track LLM latency, token usage, error rates, and cost for production AI services running on Azure Container Apps.
Blue-Green Deployment for LLM Services
Deploy new LLM service versions with zero downtime using blue-green deployments on Azure Container Apps. Learn traffic splitting, canary releases, and how to validate before cutting over.
Docker Compose for Local AI Development
Build a complete local development environment for an AI service using Docker Compose — FastAPI, Redis, PostgreSQL, and a mock Azure OpenAI server — with hot reload, health checks, and env file management.
Container Registries: ACR and ECR
Learn how to store, tag, scan, and distribute Docker images using Azure Container Registry and Amazon ECR — including a complete push workflow, image scanning, and geo-replication.
Cost Optimization: Caching, Batching, Model Routing
Cut your LLM API costs by 60–80% using semantic caching, request batching, and intelligent model routing. Real techniques used in production AI services.
Dockerising an AI API: Best Practices
Learn why AI APIs have unique Docker considerations — model weights, GPU drivers, large images — and build a production-grade Dockerfile for a FastAPI + Azure OpenAI service from scratch.
GitHub Actions Pipeline: Test → Build → Deploy
Build a complete CI/CD pipeline for an LLM service using GitHub Actions — automated tests, Docker image build and push to ACR, and deployment to Azure Container Apps with environment protection and secrets management.
Health Check Verification in Deployment
Design and implement health checks for LLM services — liveness, readiness, and startup probes. Configure Azure Container Apps to use them, and verify deployments automatically before shifting traffic.
Interview: LLMOps Scenario Questions
The most common LLMOps scenario questions asked in senior AI engineering interviews. Walk through real deployment, monitoring, and incident response scenarios with model answers.
Key LLM Metrics: TTFT, Cost/Request, Error Rate
The exact metrics every production LLM service must track. Learn what TTFT, cost-per-request, token efficiency, and error rate mean, how to measure them, and what good numbers look like.
Multi-Stage Docker Builds for AI Apps
Understand multi-stage Docker builds and how they dramatically reduce AI API image sizes — from 2.1 GB down to 480 MB — while keeping your runtime image clean, secure, and free of compilers.
Rollback Strategy for LLM Deployments
LLM deployments can fail in ways that are invisible until users complain. Learn concrete rollback strategies for bad prompts, model upgrades, and embedding schema changes — with exact Azure CLI commands.
Scale to Zero with Azure Container Apps
Configure Azure Container Apps to automatically scale your LLM service based on HTTP traffic, KEDA rules, and custom metrics — including scaling to zero replicas when idle.
Structured Logging with structlog
Replace print() and unstructured logs with structlog for AI services. Learn how to add context, trace IDs, and machine-readable logs that make debugging LLM pipelines trivial.
Testing LLM Services in CI: Mocks and Fixtures
Solve the core challenge of testing LLM services in CI — non-determinism, cost, and latency — using mock clients, VCR cassettes, fixture-based replay, and contract tests with pytest.
Chain of Thought Prompting
Elicit step-by-step reasoning with 'let's think step by step' — zero-shot CoT, few-shot CoT, why it works, and when to skip it.
Few-Shot Prompting
Provide examples in the prompt — input/output pairs, how many to use, choosing diverse examples, and chain of thought in few-shot.
What Is Prompt Engineering?
Definition, why it matters, prompting vs fine-tuning vs RAG, the anatomy of a prompt, and how temperature and top_p affect outputs.
Zero-Shot Prompting
Asking the model to perform a task with no examples — when it works, when it fails, and before/after comparisons.
Skill 2 — Backend Engineering: Build the FastAPI Core (Async, Pydantic v2, OpenAPI)
Build the PharmaBot FastAPI backend from scratch — async endpoints, Pydantic v2 request/response schemas, Server-Sent Events streaming, and automatic OpenAPI documentation.
Skill 7 — Practical LLM Integration: Streaming (SSE), Caching & Fallbacks
Wire Azure OpenAI into your FastAPI backend with production-grade patterns: Server-Sent Events streaming, prompt caching, retry on failure, and cost-aware model routing.
PharmaBot AI — Course Orientation: Architecture Blueprint & Skills Map
Understand the full system you're going to build, how the 10 skills map to real components, and how to get the most from this course.
Skill 10 — Production Delivery: CI/CD Pipeline, Logging & Azure Monitor
Ship PharmaBot with confidence: GitHub Actions CI/CD that builds, tests, and deploys automatically; structured logging with structlog; and Azure Monitor dashboards.
Skill 3 — Prompt Engineering: Safety Prompts, Disclaimers & Structured Output
Write production-grade prompts for a healthcare AI — safety-first system prompts, medical disclaimers, structured JSON output, and testing your prompts before shipping.
Skill 4 — RAG: Chunk, Embed & Index the Drug Knowledge Base
Build the retrieval-augmented generation pipeline: load 1,200 FDA drug records, chunk them intelligently, embed with Azure OpenAI, and index into Azure AI Search.
Skill 8 — Security & Privacy: Rate Limiting, Injection Detection & GDPR
Build healthcare-grade security: Redis token bucket rate limiting, prompt injection detection, PII-free session design, input sanitization, and GDPR compliance patterns.
Skill 1 — Fast Prototyping: Design the Full PharmaBot System in 30 Minutes
Learn the fast prototyping mindset: design the full system architecture on paper before writing a single line of code. Component breakdown, data flow, and every design decision explained.
Bonus — Team Collaboration: API-First Design, OpenAPI Contracts & PR Workflow
Ship PharmaBot as a team: write the OpenAPI contract before writing code, keep a structured PR workflow, and use contract-driven development to eliminate integration surprises.
Skill 5 — Vector Search: Azure AI Search HNSW + pgvector Hybrid Retrieval
Implement hybrid vector search combining Azure AI Search semantic embeddings with BM25 keyword fallback, plus pgvector as a local development alternative.
Document Chunking Strategies
Master chunking: fixed-size, sentence, paragraph, recursive, and document-aware strategies. Learn how chunk size, overlap, and boundaries drive retrieval quality.
Naive RAG: The Basic Pipeline
Build the foundational RAG pipeline: chunk documents, embed, store, retrieve top-k, and generate. Understand its real limitations before optimizing.
What Is RAG?
Retrieval-Augmented Generation: fetch relevant docs, inject into LLM context, reduce hallucination, keep knowledge current, and cite sources.
Detecting Unsafe Outputs
Build multi-layer output safety detection using classifier-based approaches, rule-based filters, LLM-as-judge, and the OpenAI Moderation and Azure Content Safety APIs — with working Python examples.
Types of Hallucinations
A taxonomy of LLM hallucination types — factual, entity, logical, and instruction hallucinations — with real before/after examples and detection strategies for each.
Types of Jailbreak Attacks
A technical survey of jailbreak attack categories — direct injection, role-play attacks, encoding tricks, many-shot jailbreaking, and prompt leaking — with examples and detection strategies.
Hallucination Mitigation Techniques
A practical engineering guide to reducing LLM hallucinations — prompt engineering, self-consistency, retrieval-augmented generation, NLI-based post-processing, and calibrated confidence scoring.
Prompt Injection Attacks
Deep dive into prompt injection — direct and indirect attacks, tool result injection, why it's fundamentally hard to fix, and practical mitigations including input validation and privilege separation.
How RAG Reduces Hallucinations
Understand how Retrieval-Augmented Generation grounds LLM answers in real documents, enforces citations, handles missing knowledge gracefully, and how to evaluate faithfulness.
Why LLMs Hallucinate
Understand the root causes of LLM hallucinations — from token prediction mechanics to sycophancy and temperature effects — so you can build systems that account for them.
Scenario: Users Are Jailbreaking Your LLM
Users are posting prompt injection attacks and getting unsafe outputs. Build a multi-layer defense: input classifier, system prompt hardening, and output safety filter.
Scenario: Knowledge Base Is Stale
Your RAG system answers based on outdated documents and new policies are not reflected. Build an event-driven ingestion pipeline with document versioning and chunk deletion.
Scenario: P95 Latency Is 12 Seconds
P50 is 3 seconds but P95 is 12 seconds — tail latency is destroying the experience for users on complex queries. Fix cold starts, context bloat, retry storms, and stream early.
Scenario: PII Found in Application Logs
A security audit reveals patient names and drug prescriptions in structured logs. Detect and anonymize PII before logging using Presidio, then redesign your log schema.
Scenario: Your RAG System Is Hallucinating
Users report factually wrong answers despite having a knowledge base. Learn to diagnose root causes, log retrieved context, and apply fixes like reranking and citation enforcement.
Scenario: RAG Pipeline Is Too Slow
End-to-end latency is 8-12 seconds and users are complaining. Break down where time is spent and apply semantic caching, async retrieval, and streaming to slash latency.
Scenario: LLM API Costs Are Too High
Your Azure OpenAI bill hit $8,000 per month and engineering is asked to cut it by 60%. Analyze token usage, apply semantic caching, model routing, and prompt compression.
Scenario: Retrieval Returns Irrelevant Results
Semantic search consistently returns wrong documents. Learn to diagnose embedding problems, dimension mismatches, and apply hybrid BM25+vector search with metadata filters.
Scenario: Scale to 1 Million Daily Users
Design a RAG chatbot for 1 million daily users. Work through back-of-envelope math, architecture decisions, cache layers, auto-scaling, and what to build vs. buy.
The Attention Mechanism Explained
How attention computes Q, K, V; the dot-product attention formula; why it captures long-range dependencies; and a Python from-scratch implementation.
Encoder vs Decoder Architecture
Encoder: bidirectional for classification/embedding (BERT). Decoder: autoregressive for generation (GPT). Encoder-decoder: translation, summarization (T5). Masked vs unmasked attention.
Layer Normalization and Residual Connections
Pre-LN vs Post-LN transformer blocks; residual connections for gradient flow; RMSNorm in modern LLMs like LLaMA; code showing a complete Pre-LN transformer block.
Multi-Head Attention
Why multiple heads let the model learn different relationship types; splitting Q/K/V into h heads; concat and project; head dimension = d_model/h; Python implementation.
Positional Encoding
Why transformers need position info; sinusoidal encoding with sin/cos; learned vs fixed; RoPE (rotary positional encoding); ALiBi; code examples.
Self-Attention vs Cross-Attention
Self-attention: query and key from the same sequence. Cross-attention: query from decoder, key/value from encoder. Use in encoder-decoder models with code examples.
LLM Evaluation Production Playbook: Quality, Safety, Cost, and Latency
Implement robust LLM evaluation in production using golden datasets, automated regression checks, online signals, and release gates.
Multimodal AI Apps with FastAPI: Text, Image, and Audio Workflows
Build multimodal AI applications with FastAPI using text, image, and audio pipelines, including OCR, speech-to-text, retrieval, and production deployment patterns.
NLP Foundation Roadmap: Transformers, Hugging Face, and Research Portfolio
A practical NLP roadmap from tokenization to transformers and BERT, with Hugging Face workflows, paper-reading skills, and beginner research portfolio strategy.
Research Project: Norwegian + Urdu Multilingual AI Assistant
Build a multilingual AI assistant for Norwegian and Urdu using Hugging Face Transformers: sentiment analysis, translation, text classification, and a multilingual chatbot — from baseline to research-quality evaluation.
RAG Systems Complete Guide (2026): From Prototype to Production
Build production-grade Retrieval-Augmented Generation systems: chunking, embeddings, hybrid search, reranking, evaluation, observability, and cost/latency optimization.
MCP vs RAG vs AI Agents: What They Actually Are and When to Use Each
Three terms everyone is using, often interchangeably. They solve completely different problems. Here's the mental model that makes them click — with real architecture diagrams and a production stack example.
Azure OpenAI — GPT-4o, Embeddings & Production Deployment
Complete Azure OpenAI guide — deploying models, chat completions, streaming, function calling, embeddings for RAG, content filtering, token management, .NET and Python SDK examples, and cost control.
Databricks — Delta Lake, PySpark & ML Workflows
Production Databricks guide — Delta Lake architecture, PySpark at scale, structured streaming, Unity Catalog, MLflow integration, Feature Store, and Model Serving. With Python examples throughout.
Hugging Face Transformers — From Model Hub to Production
Complete Hugging Face guide — Model Hub, pipelines, tokenizers, fine-tuning with Trainer API, PEFT/LoRA for efficient fine-tuning, Inference API, and deploying models to production with Inference Endpoints.
MLflow — Experiment Tracking, Model Registry & Deployment
Complete MLflow guide for ML engineers — tracking experiments, comparing runs, registering models, managing lifecycle stages, serving models as REST APIs, and integrating with Azure ML and Databricks.
Power BI — DAX, Data Modeling & Production Reporting
Production Power BI guide — semantic layer design, DAX from basics to time intelligence, DirectQuery vs Import mode, row-level security, Power BI Embedded, REST API integration, and deployment pipelines.
Snowflake — Data Warehousing, Snowpark & Data Sharing
Production Snowflake guide — virtual warehouses, storage architecture, SQL analytics, Snowpark Python, dynamic tables, data sharing, Marketplace, and cost management. With Python and SQL examples throughout.
AI-Powered Call Quality Scoring for Contact Centers
Automatically score call quality using LLMs — build a scoring rubric, send transcripts to Claude or GPT-4, extract structured scores, store in DynamoDB, and surface insights in an analytics dashboard.
Real-Time AI Transcription with DeepGram on AWS
Integrate DeepGram speech-to-text into a serverless AWS pipeline — real-time WebSocket streaming, batch transcription of S3 recordings, speaker diarization, custom vocabulary, and storing transcripts in DynamoDB.
AI Agents & Tool Calling: Build Autonomous AI Systems
Build real AI agents — understand the agentic loop, implement tool/function calling with OpenAI, create multi-step workflows, use Semantic Kernel, and ship reliable agents to production.
Build an AI Chatbot with OpenAI & .NET
Build a production-ready AI chatbot from scratch — streaming responses, conversation history, system prompts, a React frontend, rate limiting, and cost controls. Full .NET + OpenAI SDK implementation.
Advanced
MedScribe-AI: Every Phase of a Healthcare AI System — Architecture, Failures, and Fixes
A complete engineering walkthrough of a real AI-powered clinical documentation system — agent workflows, hallucination detection, state machines, RAG, and the specific failure modes we encountered and designed around.
Interview: AI Agents, Orchestration & Frameworks (LangChain, LangGraph, CrewAI, AutoGen, Semantic Kernel, MCP)
Senior interview Q&A on agent architecture, orchestration systems, framework tradeoffs, MCP servers, and production patterns for multi-agent workflows.
Interview: LLM Providers & Model Selection (OpenAI, Azure, Claude, Gemini)
Senior interview Q&A on OpenAI and Azure OpenAI, Claude and Gemini basics, model selection, cost, latency, compliance, and when to use hosted vs local models.
Interview: GenAI Use Cases — Pharmacy Assistant, Copilots, Smart Search & Workflow Automation
System design interview Q&A for pharmacy AI assistants, internal copilots, enterprise smart search, and AI workflow automation — architecture, RAG, agents, and production guardrails.
Interview: Design and Debug LangChain Agents
5 interview scenarios for LangChain agents: building a research agent, handling failures, multi-agent coordination, comparing agent types, and production clinical agent design.
Interview: Design a Multi-Step LangChain Pipeline
Walk through 5 chain design interview questions with complete LCEL solutions. Sequential chains, routing, parallel execution, error handling, and production patterns.
Interview: Choose the Right Memory for a Use Case
5 interview scenarios requiring memory selection: clinical chatbot, research assistant, customer support, multi-user platform, and production-scale deployment.
Interview: LangChain in Production
5 production interview scenarios: observability strategy, cost explosion at 10x traffic, latency optimization, multi-tenant isolation, and designing a clinical AI platform.
Building Agents with LLMs
How to build reliable LLM agents: the ReAct pattern, tool loops, memory systems, multi-agent orchestration, and production agent architecture patterns.
GPT Architecture: Inside the Decoder-Only Transformer
Deep dive into GPT's decoder-only architecture: token embeddings, causal attention, FFN layers, residual stream, and how autoregressive generation works end-to-end.
LLM Benchmarks: What They Measure and What They Don't
Deep dive into LLM benchmarks: MMLU, HumanEval, GSM8K, HellaSwag, MATH, and more. How to interpret benchmark scores, their limitations, and how to build your own evaluations.
Extending LLM Context Windows
How to extend LLMs beyond their trained context length. RoPE scaling, YaRN, LongLoRA, sliding window attention, and the engineering tradeoffs of long contexts.
LLM Cost Breakdown and Optimization
How LLM costs are structured, how to estimate them, and practical strategies for reducing API costs without sacrificing quality in production systems.
DPO: Direct Preference Optimization
How DPO achieves alignment without reinforcement learning. Covers the mathematical derivation from RLHF, the DPO loss, dataset construction, and when DPO outperforms PPO.
Emergent Capabilities in Large Language Models
Understanding emergence in LLMs: which capabilities appear suddenly at scale, why emergence happens, and how to think about unpredictable capability jumps in production AI systems.
Function Calling Internals: How Tool Use Works
How LLM function calling works under the hood: JSON schema injection, token patterns, multi-tool orchestration, error recovery, and building reliable tool-using agents.
LLM Hallucination: Causes and Mitigations
Why LLMs hallucinate, the mechanisms behind confabulation, and systematic approaches to reduce hallucination in production systems.
LLM Inference and Serving
How to serve LLMs at scale: KV cache management, continuous batching, vLLM, PagedAttention, speculative decoding, and production deployment patterns.
Interview: LLMs Deep Dive (Part 1)
10 senior-level interview questions on LLM internals: pretraining, architecture, RLHF, quantization, and production serving.
Interview: LLMs Deep Dive (Part 2)
10 more senior-level interview questions: emergent capabilities, fine-tuning decisions, context extension, alignment tradeoffs, and production LLM system design.
Multimodal LLMs: Vision, Audio, and Beyond
How multimodal LLMs process images, audio, and video alongside text. Vision encoders, cross-modal attention, GPT-4V internals, and building multimodal applications.
Open Source LLMs: LLaMA, Mistral, and the Ecosystem
The open source LLM landscape: LLaMA-3, Mistral, Phi, Falcon, and Gemma. How to choose, download, run, and fine-tune open source models for production use.
LLM Quantization: Deep Dive
How quantization reduces LLM size and speeds inference. GPTQ, AWQ, GGUF, bitsandbytes NF4, and the math behind weight quantization without accuracy collapse.
Integrating RAG with LLMs
How retrieval-augmented generation works end-to-end: embedding documents, querying vector stores, assembling context, and building production-grade RAG pipelines.
RLHF: Reinforcement Learning from Human Feedback
How RLHF aligns LLMs with human preferences. Covers reward model training, PPO training loop, reference model KL penalty, and why RLHF is complex but powerful.
LLM Safety and Alignment
How LLMs are aligned to be safe, helpful, and honest. Constitutional AI, red-teaming, RLHF safety, jailbreak mechanics, and building safe AI systems.
Supervised Fine-Tuning (SFT) for LLMs
Turn a pretrained base model into an instruction-following assistant using SFT. Covers data formats, loss masking, LoRA adapters, SFTTrainer, and quality signals.
LLM Training Infrastructure
How large language models are trained at scale: distributed training strategies, GPU communication, mixed precision, gradient checkpointing, and fault tolerance.
LLM Training Objectives: From Next-Token to Alignment
The full training objective stack for large language models: next-token prediction loss, cross-entropy mechanics, data weighting, and how pretraining creates the base for alignment.
Bayesian Hyperparameter Optimization
Bayesian optimization for hyperparameter tuning: surrogate models, acquisition functions, how it differs from grid and random search, and practical usage with Optuna and scikit-optimize.
Interview: Bias-Variance Real Scenario
Interview walk-through: diagnose bias-variance problems in a clinical readmission model — with a step-by-step approach covering diagnosis, root cause, targeted fixes, and tradeoff discussion.
Interview: Confusion Matrix Deep Dive
Interview walk-through: analyze a confusion matrix for a drug safety classifier, interpret error patterns, select the right threshold, and explain the clinical implications of each error type.
Interview: ML Debugging Scenario
Interview walk-through: diagnose a production model that was working but suddenly dropped from AUC 0.87 to 0.61 — covering systematic debugging, root cause identification, and remediation.
Debugging ML Models in Production
Production ML debugging: monitoring prediction distributions, detecting silent failures, tracking performance over time, handling model degradation, and setting up alerts for data and concept drift.
Interview: Choosing the Right Evaluation Metric
Interview walk-through: how to choose the right evaluation metric for 5 clinical and AI scenarios — covering class imbalance, cost asymmetry, threshold selection, and metric pitfalls.
Interview: Feature Engineering Scenario
Interview walk-through: engineer features from raw EHR data for a 30-day readmission model — covering extraction, transformation, interactions, handling missing values, and validating feature quality.
Interview: Hyperparameter Tuning Scenario
Interview walk-through: choose and execute a hyperparameter tuning strategy for a gradient boosting model on a clinical dataset — covering budget, search method, validation procedure, and overfitting the search.
Interview: Overfitting Walk-Through Scenario
Interview walk-through: diagnose and fix overfitting in a clinical drug classifier — with a systematic approach covering detection, root cause analysis, and five targeted fixes.
Interview: Regression vs Classification Scenarios
Interview walk-through: identify whether a problem is regression or classification from the task description — with 6 real scenarios covering clinical AI, LLM systems, and healthcare applications.
Interview: Regularization Scenario
Interview walk-through: diagnose and fix overfitting using regularization in a clinical model — covering L1 vs L2 choice, strength tuning, Elastic Net, and explaining results to a non-technical audience.
Interview: ROC-AUC and Threshold Deep Dive
Interview walk-through: explain ROC-AUC to a clinical stakeholder, choose between ROC and PR curves, tune a threshold for a sepsis model, and diagnose a model with excellent AUC but poor real-world recall.
Interview: When to Use Supervised vs Unsupervised?
Interview walk-through: choose the right learning paradigm for real scenarios — drug classification, patient clustering, anomaly detection, and LLM alignment — with clear decision logic and common traps.
Time and Space Complexity for AI Engineers
Big-O complexity for AI engineering interviews: understand O(1) through O(n²), analyze Python data structures, and apply complexity reasoning to embedding search, RAG pipelines, and LLM cost estimation.
Dictionary and Hashing Interview Problems
Common dictionary and hashing interview problems for AI engineers: frequency maps, grouping anagrams, LRU cache, top-K frequent elements, and two-sum variants.
List and Array Interview Problems
Common list and array interview problems for AI engineers: two-sum, sliding window max, merge sorted arrays, find duplicates, rotate array, and flatten nested lists.
Recursion Interview Problems
Recursion problems common in AI engineering interviews: tree traversal, memoized Fibonacci, power sets, merge sort, JSON traversal, and recursive RAG tree summarization.
String Manipulation Problems
Common string interview problems for AI engineers: reverse words, check palindrome, find anagrams, parse structured text, clean LLM output, and extract entities.
Interview: NumPy Problem Walk-Through
5 NumPy interview problems with full solutions: vectorized cosine similarity, z-score normalization, top-k retrieval, confusion matrix, and an embedding similarity pipeline.
Advanced RAG Patterns
Beyond basic RAG: RAPTOR hierarchical indexing, SELF-RAG with retrieval decisions, iterative retrieval, adaptive context assembly, and reasoning over retrieved content.
RAG Evaluation: Metrics and Frameworks
Measure RAG system quality with RAGAS, TruLens, and custom metrics. Evaluate retrieval precision, answer faithfulness, context relevance, and end-to-end correctness.
Graph RAG: Knowledge Graph-Enhanced Retrieval
Enhance RAG with knowledge graphs. GraphRAG by Microsoft, entity extraction, relationship indexing, and combining vector search with graph traversal.
RAG Interview Questions Part 1
10 deep-dive RAG interview questions with complete answers: vector search fundamentals, chunking strategies, hybrid search, embedding models, and retrieval evaluation.
RAG Interview Questions Part 2
10 advanced RAG interview questions with complete answers: production architecture, Graph RAG, multimodal RAG, security, cost optimization, and system design.
Multimodal RAG: Images and Documents
Extend RAG to handle images, charts, and mixed-media documents. Caption-based indexing, CLIP embeddings for image search, and multi-modal context assembly.
RAG in Production: Architecture and Operations
Deploy RAG systems at scale: async pipelines, observability, error handling, A/B testing, deployment patterns, and operational best practices for clinical AI.
RAG Security: Prompt Injection and Data Protection
Secure RAG systems against prompt injection, data exfiltration, PII leakage, and adversarial document attacks. Defense-in-depth for clinical AI.
History of Language Models
A comprehensive journey from n-gram models to GPT-4, Claude, and Gemini — tracing the key architectural breakthroughs that define modern LLMs.
Pre-training Data: What LLMs Learn From
A deep dive into Common Crawl, Books, GitHub, and Wikipedia — data mixing ratios, deduplication, quality filtering, and the data poisoning threat.
Tokenization Deep Dive
BPE, WordPiece, SentencePiece — how tokenizers work, why vocabulary size matters, and the surprising impact of tokenization on model quality across languages.
Skill 6 — AI Agents: Build the Triage, Drug Info & Interaction Agents
Build a three-agent LangChain pipeline: a Triage Agent that classifies queries and routes them to specialist Drug Info or Interaction Checker agents.
Skill 9 — Azure Cloud: Container Apps, Azure OpenAI & AI Search in Production
Deploy PharmaBot to Azure Container Apps, configure Azure OpenAI and AI Search for production, manage secrets with Azure Key Vault, and set up autoscaling.
Capstone: Ship PharmaBot AI to Azure Production
The final milestone: wire all 10 components together, run the full test suite, deploy PharmaBot to Azure Container Apps, verify the health check, and reflect on what you built.
AI Agents and Tool Calling Workflows: Production Patterns
Design reliable AI agents with tool calling, planning loops, memory boundaries, retries, and human-in-the-loop safeguards.
Building a Production RAG Pipeline: From Documents to Answers
A complete guide to building a Retrieval-Augmented Generation pipeline that actually works in production — document ingestion, vector storage, retrieval, and LLM integration.