Chat Models in LangChain

Chat models are the core engine of every LangChain application. A chat model accepts a list of messages as input and returns an AIMessage as output. LangChain wraps every major provider behind the same BaseChatModel interface, so you can swap between OpenAI, Anthropic, Azure, and others without changing your chain logic.

The Message Types

Before looking at models, understand the message types. A chat model works with a conversation, not a single string. LangChain represents that conversation as a list of typed messages.

Python

from langchain_core.messages import (
    SystemMessage,
    HumanMessage,
    AIMessage,
    ToolMessage,
    FunctionMessage,
)

# SystemMessage — sets the assistant's persona or instructions
system = SystemMessage(content="You are a senior Python engineer. Be concise and precise.")

# HumanMessage — the user's input
human = HumanMessage(content="What is the GIL in Python?")

# AIMessage — the model's response (you receive this, or inject it for few-shot)
ai = AIMessage(content="The GIL is a mutex that protects CPython's memory manager...")

# A conversation is simply a list of messages
conversation = [system, human]

ChatOpenAI

The most commonly used model in LangChain applications.

Python

from langchain_openai import ChatOpenAI

# Basic initialization
llm = ChatOpenAI(
    model="gpt-4o",           # or gpt-4o-mini, gpt-4-turbo
    temperature=0,            # 0 = deterministic, 1 = creative
    max_tokens=1024,          # limit response length
    timeout=30,               # seconds before raising an error
    max_retries=3,            # automatic retry on transient failures
)

# Invoke with a list of messages
from langchain_core.messages import HumanMessage, SystemMessage

response = llm.invoke([
    SystemMessage(content="You are a Python expert."),
    HumanMessage(content="Explain list comprehensions in two sentences."),
])

print(type(response))     # AIMessage
print(response.content)   # The text response
print(response.usage_metadata)  # {'input_tokens': 28, 'output_tokens': 42, ...}

Model Selection Guide

| Model | Best For | Cost | |---|---|---| | gpt-4o-mini | Fast, cheap tasks; classification; extraction | Lowest | | gpt-4o | Balanced quality and speed | Medium | | gpt-4-turbo | Complex reasoning, large context | Higher | | o1-mini | Math, coding, logical reasoning | Medium | | o1 | Hardest reasoning tasks | Highest |

ChatAnthropic

Python

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(
    model="claude-sonnet-4-6",   # or claude-opus-4, claude-haiku-3-5
    temperature=0,
    max_tokens=2048,
    # ANTHROPIC_API_KEY is read from environment automatically
)

response = llm.invoke([
    SystemMessage(content="You are a concise technical writer."),
    HumanMessage(content="What is a context window?"),
])

print(response.content)

Anthropic Model Selection

| Model | Best For | |---|---| | claude-haiku-3-5 | Speed-critical tasks, simple extraction | | claude-sonnet-4-6 | Balanced quality and performance | | claude-opus-4 | Highest quality reasoning and writing |

AzureChatOpenAI

Enterprise teams often deploy OpenAI models through Azure OpenAI Service for data residency and compliance.

Python

from langchain_openai import AzureChatOpenAI
import os

llm = AzureChatOpenAI(
    azure_deployment="gpt-4o",           # your deployment name in Azure
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],  # e.g. https://myinstance.openai.azure.com/
    api_version="2024-08-01-preview",
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    temperature=0,
    max_tokens=1024,
)

response = llm.invoke([HumanMessage(content="Summarize the benefits of Azure OpenAI.")])
print(response.content)

The rest of your chain code is identical regardless of whether you use ChatOpenAI or AzureChatOpenAI. This is the power of the shared interface.

invoke() — Synchronous Single Call

invoke() is the simplest way to call a model. It blocks until the model returns the full response.

Python

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Pass a list of messages
response = llm.invoke([HumanMessage(content="What is 12 * 34?")])
print(response.content)  # "408"

# Or pass a string (LangChain wraps it in HumanMessage automatically)
response = llm.invoke("What is 12 * 34?")
print(response.content)  # "408"

stream() — Token-by-Token Streaming

Stream tokens as they arrive to give users a responsive experience.

Python

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

messages = [
    SystemMessage(content="You are a creative writer."),
    HumanMessage(content="Write a haiku about Python programming."),
]

# Stream returns an iterator of AIMessageChunk objects
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)
print()  # newline after streaming completes

# Each chunk has partial content and metadata
for chunk in llm.stream(messages):
    if chunk.content:
        print(f"Chunk: '{chunk.content}'")
    if chunk.usage_metadata:
        print(f"Usage: {chunk.usage_metadata}")

batch() — Parallel Multiple Calls

batch() sends multiple requests concurrently. Far faster than a loop with invoke().

Python

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

questions = [
    [HumanMessage(content="What is Python?")],
    [HumanMessage(content="What is Rust?")],
    [HumanMessage(content="What is Go?")],
    [HumanMessage(content="What is TypeScript?")],
]

# All four run concurrently
responses = llm.batch(questions)

for q, r in zip(questions, responses):
    print(f"Q: {q[0].content}")
    print(f"A: {r.content[:100]}...\n")

# Control concurrency with max_concurrency
responses = llm.batch(questions, config={"max_concurrency": 2})

Async Methods

All three methods have async counterparts for use in async frameworks like FastAPI.

Python

import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini")

async def main():
    # Async invoke
    response = await llm.ainvoke([HumanMessage(content="What is asyncio?")])
    print(response.content)

    # Async stream
    async for chunk in llm.astream([HumanMessage(content="Write a poem about async/await")]):
        print(chunk.content, end="", flush=True)
    print()

    # Async batch
    responses = await llm.abatch([
        [HumanMessage(content="Hello")],
        [HumanMessage(content="Hi")],
    ])
    for r in responses:
        print(r.content)

asyncio.run(main())

Model Parameters in Depth

Python

from langchain_openai import ChatOpenAI

# temperature: controls randomness
# 0 = always picks the most probable token
# 1 = samples freely from the distribution
deterministic = ChatOpenAI(model="gpt-4o-mini", temperature=0)
creative = ChatOpenAI(model="gpt-4o-mini", temperature=0.9)

# max_tokens: hard cap on response length
short = ChatOpenAI(model="gpt-4o-mini", max_tokens=50)

# top_p: nucleus sampling (alternative to temperature)
# Only use one of temperature or top_p, not both
nuclear = ChatOpenAI(model="gpt-4o-mini", top_p=0.95)

# presence_penalty: penalize tokens that have already appeared
# Range: -2.0 to 2.0. Positive values encourage new topics.
diverse = ChatOpenAI(model="gpt-4o-mini", presence_penalty=0.5)

# frequency_penalty: penalize tokens proportional to their frequency
# Range: -2.0 to 2.0. Positive values reduce repetition.
nonrepetitive = ChatOpenAI(model="gpt-4o-mini", frequency_penalty=0.5)

# stop: stop generation when these tokens appear
stopped = ChatOpenAI(model="gpt-4o-mini", stop=["###", "\n\n\n"])

# logprobs: return log probabilities of output tokens
with_probs = ChatOpenAI(model="gpt-4o-mini", logprobs=True)

Binding Parameters Per Call

You can also pass parameters at invocation time, overriding the defaults:

Python

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Override temperature just for this call
response = llm.invoke(
    [HumanMessage(content="Write a creative story opening.")],
    temperature=0.9,       # overrides the 0 set at init
    max_tokens=200,
)
print(response.content)

Few-Shot Prompting with Message History

Inject example exchanges to teach the model a specific format:

Python

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Few-shot: show the model the exact format you want
messages = [
    SystemMessage(content="Extract the programming language and version from user input. Reply with JSON only."),
    # Example 1
    HumanMessage(content="I'm using Python 3.12"),
    AIMessage(content='{"language": "Python", "version": "3.12"}'),
    # Example 2
    HumanMessage(content="Running Node.js 20 LTS"),
    AIMessage(content='{"language": "Node.js", "version": "20"}'),
    # Actual query
    HumanMessage(content="My stack is Ruby 3.3 with Rails 7"),
]

response = llm.invoke(messages)
print(response.content)
# {"language": "Ruby", "version": "3.3"}

Accessing Response Metadata

Python

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
response = llm.invoke([HumanMessage(content="What is 2 + 2?")])

# Token usage
print(response.usage_metadata)
# {'input_tokens': 14, 'output_tokens': 1, 'total_tokens': 15}

# Response metadata from the provider
print(response.response_metadata)
# {'model_name': 'gpt-4o-mini', 'finish_reason': 'stop', ...}

# The raw ID of this generation
print(response.id)  # chatcmpl-abc123...

# Content as plain string
print(response.content)  # "4"

Comparing Providers Side by Side

Python

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage

question = [HumanMessage(content="Explain tail call optimization in three sentences.")]

openai_response = ChatOpenAI(model="gpt-4o-mini", temperature=0).invoke(question)
anthropic_response = ChatAnthropic(model="claude-haiku-3-5", temperature=0).invoke(question)

print("=== OpenAI ===")
print(openai_response.content)

print("\n=== Anthropic ===")
print(anthropic_response.content)

Creating a Model Factory

When you want to configure the model from external config:

Python

from langchain_core.language_models import BaseChatModel
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from typing import Literal

def get_chat_model(
    provider: Literal["openai", "anthropic"],
    model: str,
    temperature: float = 0,
) -> BaseChatModel:
    if provider == "openai":
        return ChatOpenAI(model=model, temperature=temperature)
    elif provider == "anthropic":
        return ChatAnthropic(model=model, temperature=temperature)
    else:
        raise ValueError(f"Unknown provider: {provider}")

# Use in a chain
llm = get_chat_model("anthropic", "claude-sonnet-4-6", temperature=0.3)
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)

Summary

Chat models in LangChain follow a uniform interface regardless of provider:

invoke(messages) — synchronous single response
stream(messages) — token-by-token streaming
batch(list_of_messages) — concurrent parallel requests
Async variants: ainvoke, astream, abatch

The message types — SystemMessage, HumanMessage, AIMessage — represent the conversation structure. Use SystemMessage for instructions, HumanMessage for user input, and AIMessage for model responses (especially in few-shot examples).

In the next lesson you will learn how to parameterize your prompts with PromptTemplate and ChatPromptTemplate, separating your prompt logic from your chain logic.

Chat Models in LangChain

Chat Models in LangChain

The Message Types

ChatOpenAI

Model Selection Guide

ChatAnthropic

Anthropic Model Selection

AzureChatOpenAI

invoke() — Synchronous Single Call

stream() — Token-by-Token Streaming

batch() — Parallel Multiple Calls

Async Methods

Model Parameters in Depth

Binding Parameters Per Call

Few-Shot Prompting with Message History

Accessing Response Metadata

Comparing Providers Side by Side

Creating a Model Factory

Summary

Enjoyed this article?

Leave a comment