Chat Models in LangChain
ChatOpenAI, ChatAnthropic, AzureChatOpenAI, model parameters, invoke() vs stream() vs batch(), and HumanMessage/AIMessage/SystemMessage in depth.
Chat Models in LangChain
Chat models are the core engine of every LangChain application. A chat model accepts a list of messages as input and returns an AIMessage as output. LangChain wraps every major provider behind the same BaseChatModel interface, so you can swap between OpenAI, Anthropic, Azure, and others without changing your chain logic.
The Message Types
Before looking at models, understand the message types. A chat model works with a conversation, not a single string. LangChain represents that conversation as a list of typed messages.
from langchain_core.messages import (
SystemMessage,
HumanMessage,
AIMessage,
ToolMessage,
FunctionMessage,
)
# SystemMessage — sets the assistant's persona or instructions
system = SystemMessage(content="You are a senior Python engineer. Be concise and precise.")
# HumanMessage — the user's input
human = HumanMessage(content="What is the GIL in Python?")
# AIMessage — the model's response (you receive this, or inject it for few-shot)
ai = AIMessage(content="The GIL is a mutex that protects CPython's memory manager...")
# A conversation is simply a list of messages
conversation = [system, human]ChatOpenAI
The most commonly used model in LangChain applications.
from langchain_openai import ChatOpenAI
# Basic initialization
llm = ChatOpenAI(
model="gpt-4o", # or gpt-4o-mini, gpt-4-turbo
temperature=0, # 0 = deterministic, 1 = creative
max_tokens=1024, # limit response length
timeout=30, # seconds before raising an error
max_retries=3, # automatic retry on transient failures
)
# Invoke with a list of messages
from langchain_core.messages import HumanMessage, SystemMessage
response = llm.invoke([
SystemMessage(content="You are a Python expert."),
HumanMessage(content="Explain list comprehensions in two sentences."),
])
print(type(response)) # AIMessage
print(response.content) # The text response
print(response.usage_metadata) # {'input_tokens': 28, 'output_tokens': 42, ...}Model Selection Guide
| Model | Best For | Cost | |---|---|---| | gpt-4o-mini | Fast, cheap tasks; classification; extraction | Lowest | | gpt-4o | Balanced quality and speed | Medium | | gpt-4-turbo | Complex reasoning, large context | Higher | | o1-mini | Math, coding, logical reasoning | Medium | | o1 | Hardest reasoning tasks | Highest |
ChatAnthropic
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(
model="claude-sonnet-4-6", # or claude-opus-4, claude-haiku-3-5
temperature=0,
max_tokens=2048,
# ANTHROPIC_API_KEY is read from environment automatically
)
response = llm.invoke([
SystemMessage(content="You are a concise technical writer."),
HumanMessage(content="What is a context window?"),
])
print(response.content)Anthropic Model Selection
| Model | Best For | |---|---| | claude-haiku-3-5 | Speed-critical tasks, simple extraction | | claude-sonnet-4-6 | Balanced quality and performance | | claude-opus-4 | Highest quality reasoning and writing |
AzureChatOpenAI
Enterprise teams often deploy OpenAI models through Azure OpenAI Service for data residency and compliance.
from langchain_openai import AzureChatOpenAI
import os
llm = AzureChatOpenAI(
azure_deployment="gpt-4o", # your deployment name in Azure
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], # e.g. https://myinstance.openai.azure.com/
api_version="2024-08-01-preview",
api_key=os.environ["AZURE_OPENAI_API_KEY"],
temperature=0,
max_tokens=1024,
)
response = llm.invoke([HumanMessage(content="Summarize the benefits of Azure OpenAI.")])
print(response.content)The rest of your chain code is identical regardless of whether you use ChatOpenAI or AzureChatOpenAI. This is the power of the shared interface.
invoke() — Synchronous Single Call
invoke() is the simplest way to call a model. It blocks until the model returns the full response.
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Pass a list of messages
response = llm.invoke([HumanMessage(content="What is 12 * 34?")])
print(response.content) # "408"
# Or pass a string (LangChain wraps it in HumanMessage automatically)
response = llm.invoke("What is 12 * 34?")
print(response.content) # "408"stream() — Token-by-Token Streaming
Stream tokens as they arrive to give users a responsive experience.
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
messages = [
SystemMessage(content="You are a creative writer."),
HumanMessage(content="Write a haiku about Python programming."),
]
# Stream returns an iterator of AIMessageChunk objects
for chunk in llm.stream(messages):
print(chunk.content, end="", flush=True)
print() # newline after streaming completes
# Each chunk has partial content and metadata
for chunk in llm.stream(messages):
if chunk.content:
print(f"Chunk: '{chunk.content}'")
if chunk.usage_metadata:
print(f"Usage: {chunk.usage_metadata}")batch() — Parallel Multiple Calls
batch() sends multiple requests concurrently. Far faster than a loop with invoke().
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
questions = [
[HumanMessage(content="What is Python?")],
[HumanMessage(content="What is Rust?")],
[HumanMessage(content="What is Go?")],
[HumanMessage(content="What is TypeScript?")],
]
# All four run concurrently
responses = llm.batch(questions)
for q, r in zip(questions, responses):
print(f"Q: {q[0].content}")
print(f"A: {r.content[:100]}...\n")
# Control concurrency with max_concurrency
responses = llm.batch(questions, config={"max_concurrency": 2})Async Methods
All three methods have async counterparts for use in async frameworks like FastAPI.
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(model="gpt-4o-mini")
async def main():
# Async invoke
response = await llm.ainvoke([HumanMessage(content="What is asyncio?")])
print(response.content)
# Async stream
async for chunk in llm.astream([HumanMessage(content="Write a poem about async/await")]):
print(chunk.content, end="", flush=True)
print()
# Async batch
responses = await llm.abatch([
[HumanMessage(content="Hello")],
[HumanMessage(content="Hi")],
])
for r in responses:
print(r.content)
asyncio.run(main())Model Parameters in Depth
from langchain_openai import ChatOpenAI
# temperature: controls randomness
# 0 = always picks the most probable token
# 1 = samples freely from the distribution
deterministic = ChatOpenAI(model="gpt-4o-mini", temperature=0)
creative = ChatOpenAI(model="gpt-4o-mini", temperature=0.9)
# max_tokens: hard cap on response length
short = ChatOpenAI(model="gpt-4o-mini", max_tokens=50)
# top_p: nucleus sampling (alternative to temperature)
# Only use one of temperature or top_p, not both
nuclear = ChatOpenAI(model="gpt-4o-mini", top_p=0.95)
# presence_penalty: penalize tokens that have already appeared
# Range: -2.0 to 2.0. Positive values encourage new topics.
diverse = ChatOpenAI(model="gpt-4o-mini", presence_penalty=0.5)
# frequency_penalty: penalize tokens proportional to their frequency
# Range: -2.0 to 2.0. Positive values reduce repetition.
nonrepetitive = ChatOpenAI(model="gpt-4o-mini", frequency_penalty=0.5)
# stop: stop generation when these tokens appear
stopped = ChatOpenAI(model="gpt-4o-mini", stop=["###", "\n\n\n"])
# logprobs: return log probabilities of output tokens
with_probs = ChatOpenAI(model="gpt-4o-mini", logprobs=True)Binding Parameters Per Call
You can also pass parameters at invocation time, overriding the defaults:
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Override temperature just for this call
response = llm.invoke(
[HumanMessage(content="Write a creative story opening.")],
temperature=0.9, # overrides the 0 set at init
max_tokens=200,
)
print(response.content)Few-Shot Prompting with Message History
Inject example exchanges to teach the model a specific format:
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Few-shot: show the model the exact format you want
messages = [
SystemMessage(content="Extract the programming language and version from user input. Reply with JSON only."),
# Example 1
HumanMessage(content="I'm using Python 3.12"),
AIMessage(content='{"language": "Python", "version": "3.12"}'),
# Example 2
HumanMessage(content="Running Node.js 20 LTS"),
AIMessage(content='{"language": "Node.js", "version": "20"}'),
# Actual query
HumanMessage(content="My stack is Ruby 3.3 with Rails 7"),
]
response = llm.invoke(messages)
print(response.content)
# {"language": "Ruby", "version": "3.3"}Accessing Response Metadata
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
response = llm.invoke([HumanMessage(content="What is 2 + 2?")])
# Token usage
print(response.usage_metadata)
# {'input_tokens': 14, 'output_tokens': 1, 'total_tokens': 15}
# Response metadata from the provider
print(response.response_metadata)
# {'model_name': 'gpt-4o-mini', 'finish_reason': 'stop', ...}
# The raw ID of this generation
print(response.id) # chatcmpl-abc123...
# Content as plain string
print(response.content) # "4"Comparing Providers Side by Side
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage
question = [HumanMessage(content="Explain tail call optimization in three sentences.")]
openai_response = ChatOpenAI(model="gpt-4o-mini", temperature=0).invoke(question)
anthropic_response = ChatAnthropic(model="claude-haiku-3-5", temperature=0).invoke(question)
print("=== OpenAI ===")
print(openai_response.content)
print("\n=== Anthropic ===")
print(anthropic_response.content)Creating a Model Factory
When you want to configure the model from external config:
from langchain_core.language_models import BaseChatModel
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from typing import Literal
def get_chat_model(
provider: Literal["openai", "anthropic"],
model: str,
temperature: float = 0,
) -> BaseChatModel:
if provider == "openai":
return ChatOpenAI(model=model, temperature=temperature)
elif provider == "anthropic":
return ChatAnthropic(model=model, temperature=temperature)
else:
raise ValueError(f"Unknown provider: {provider}")
# Use in a chain
llm = get_chat_model("anthropic", "claude-sonnet-4-6", temperature=0.3)
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)Summary
Chat models in LangChain follow a uniform interface regardless of provider:
invoke(messages)— synchronous single responsestream(messages)— token-by-token streamingbatch(list_of_messages)— concurrent parallel requests- Async variants:
ainvoke,astream,abatch
The message types — SystemMessage, HumanMessage, AIMessage — represent the conversation structure. Use SystemMessage for instructions, HumanMessage for user input, and AIMessage for model responses (especially in few-shot examples).
In the next lesson you will learn how to parameterize your prompts with PromptTemplate and ChatPromptTemplate, separating your prompt logic from your chain logic.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.