Learnixo
Back to blog
AI Systemsintermediate

Chat Models in LangChain

ChatOpenAI, ChatAnthropic, AzureChatOpenAI, model parameters, invoke() vs stream() vs batch(), and HumanMessage/AIMessage/SystemMessage in depth.

Asma Hafeez KhanMay 15, 20267 min read
LangChainPythonChatOpenAIChatAnthropicChat ModelsMessages
Share:𝕏

Chat Models in LangChain

Chat models are the core engine of every LangChain application. A chat model accepts a list of messages as input and returns an AIMessage as output. LangChain wraps every major provider behind the same BaseChatModel interface, so you can swap between OpenAI, Anthropic, Azure, and others without changing your chain logic.

The Message Types

Before looking at models, understand the message types. A chat model works with a conversation, not a single string. LangChain represents that conversation as a list of typed messages.

Python
from langchain_core.messages import (
    SystemMessage,
    HumanMessage,
    AIMessage,
    ToolMessage,
    FunctionMessage,
)

# SystemMessage  sets the assistant's persona or instructions
system = SystemMessage(content="You are a senior Python engineer. Be concise and precise.")

# HumanMessage — the user's input
human = HumanMessage(content="What is the GIL in Python?")

# AIMessage  the model's response (you receive this, or inject it for few-shot)
ai = AIMessage(content="The GIL is a mutex that protects CPython's memory manager...")

# A conversation is simply a list of messages
conversation = [system, human]

ChatOpenAI

The most commonly used model in LangChain applications.

Python
from langchain_openai import ChatOpenAI

# Basic initialization
llm = ChatOpenAI(
    model="gpt-4o",           # or gpt-4o-mini, gpt-4-turbo
    temperature=0,            # 0 = deterministic, 1 = creative
    max_tokens=1024,          # limit response length
    timeout=30,               # seconds before raising an error
    max_retries=3,            # automatic retry on transient failures
)

# Invoke with a list of messages
from langchain_core.messages import HumanMessage, SystemMessage

response = llm.invoke([
    SystemMessage(content="You are a Python expert."),
    HumanMessage(content="Explain list comprehensions in two sentences."),
])

print(type(response))     # AIMessage
print(response.content)   # The text response
print(response.usage_metadata)  # {'input_tokens': 28, 'output_tokens': 42, ...}

Model Selection Guide

| Model | Best For | Cost | |---|---|---| | gpt-4o-mini | Fast, cheap tasks; classification; extraction | Lowest | | gpt-4o | Balanced quality and speed | Medium | | gpt-4-turbo | Complex reasoning, large context | Higher | | o1-mini | Math, coding, logical reasoning | Medium | | o1 | Hardest reasoning tasks | Highest |

ChatAnthropic

Python
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(
    model="claude-sonnet-4-6",   # or claude-opus-4, claude-haiku-3-5
    temperature=0,
    max_tokens=2048,
    # ANTHROPIC_API_KEY is read from environment automatically
)

response = llm.invoke([
    SystemMessage(content="You are a concise technical writer."),
    HumanMessage(content="What is a context window?"),
])

print(response.content)

Anthropic Model Selection

| Model | Best For | |---|---| | claude-haiku-3-5 | Speed-critical tasks, simple extraction | | claude-sonnet-4-6 | Balanced quality and performance | | claude-opus-4 | Highest quality reasoning and writing |

AzureChatOpenAI

Enterprise teams often deploy OpenAI models through Azure OpenAI Service for data residency and compliance.

Python
from langchain_openai import AzureChatOpenAI
import os

llm = AzureChatOpenAI(
    azure_deployment="gpt-4o",           # your deployment name in Azure
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],  # e.g. https://myinstance.openai.azure.com/
    api_version="2024-08-01-preview",
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    temperature=0,
    max_tokens=1024,
)

response = llm.invoke([HumanMessage(content="Summarize the benefits of Azure OpenAI.")])
print(response.content)

The rest of your chain code is identical regardless of whether you use ChatOpenAI or AzureChatOpenAI. This is the power of the shared interface.

invoke() — Synchronous Single Call

invoke() is the simplest way to call a model. It blocks until the model returns the full response.

Python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Pass a list of messages
response = llm.invoke([HumanMessage(content="What is 12 * 34?")])
print(response.content)  # "408"

# Or pass a string (LangChain wraps it in HumanMessage automatically)
response = llm.invoke("What is 12 * 34?")
print(response.content)  # "408"

stream() — Token-by-Token Streaming

Stream tokens as they arrive to give users a responsive experience.

Python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

messages = [
    SystemMessage(content="You are a creative writer."),
    HumanMessage(content="Write a haiku about Python programming."),
]

# Stream returns an iterator of AIMessageChunk objects
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)
print()  # newline after streaming completes

# Each chunk has partial content and metadata
for chunk in llm.stream(messages):
    if chunk.content:
        print(f"Chunk: '{chunk.content}'")
    if chunk.usage_metadata:
        print(f"Usage: {chunk.usage_metadata}")

batch() — Parallel Multiple Calls

batch() sends multiple requests concurrently. Far faster than a loop with invoke().

Python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

questions = [
    [HumanMessage(content="What is Python?")],
    [HumanMessage(content="What is Rust?")],
    [HumanMessage(content="What is Go?")],
    [HumanMessage(content="What is TypeScript?")],
]

# All four run concurrently
responses = llm.batch(questions)

for q, r in zip(questions, responses):
    print(f"Q: {q[0].content}")
    print(f"A: {r.content[:100]}...\n")

# Control concurrency with max_concurrency
responses = llm.batch(questions, config={"max_concurrency": 2})

Async Methods

All three methods have async counterparts for use in async frameworks like FastAPI.

Python
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini")

async def main():
    # Async invoke
    response = await llm.ainvoke([HumanMessage(content="What is asyncio?")])
    print(response.content)

    # Async stream
    async for chunk in llm.astream([HumanMessage(content="Write a poem about async/await")]):
        print(chunk.content, end="", flush=True)
    print()

    # Async batch
    responses = await llm.abatch([
        [HumanMessage(content="Hello")],
        [HumanMessage(content="Hi")],
    ])
    for r in responses:
        print(r.content)

asyncio.run(main())

Model Parameters in Depth

Python
from langchain_openai import ChatOpenAI

# temperature: controls randomness
# 0 = always picks the most probable token
# 1 = samples freely from the distribution
deterministic = ChatOpenAI(model="gpt-4o-mini", temperature=0)
creative = ChatOpenAI(model="gpt-4o-mini", temperature=0.9)

# max_tokens: hard cap on response length
short = ChatOpenAI(model="gpt-4o-mini", max_tokens=50)

# top_p: nucleus sampling (alternative to temperature)
# Only use one of temperature or top_p, not both
nuclear = ChatOpenAI(model="gpt-4o-mini", top_p=0.95)

# presence_penalty: penalize tokens that have already appeared
# Range: -2.0 to 2.0. Positive values encourage new topics.
diverse = ChatOpenAI(model="gpt-4o-mini", presence_penalty=0.5)

# frequency_penalty: penalize tokens proportional to their frequency
# Range: -2.0 to 2.0. Positive values reduce repetition.
nonrepetitive = ChatOpenAI(model="gpt-4o-mini", frequency_penalty=0.5)

# stop: stop generation when these tokens appear
stopped = ChatOpenAI(model="gpt-4o-mini", stop=["###", "\n\n\n"])

# logprobs: return log probabilities of output tokens
with_probs = ChatOpenAI(model="gpt-4o-mini", logprobs=True)

Binding Parameters Per Call

You can also pass parameters at invocation time, overriding the defaults:

Python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Override temperature just for this call
response = llm.invoke(
    [HumanMessage(content="Write a creative story opening.")],
    temperature=0.9,       # overrides the 0 set at init
    max_tokens=200,
)
print(response.content)

Few-Shot Prompting with Message History

Inject example exchanges to teach the model a specific format:

Python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Few-shot: show the model the exact format you want
messages = [
    SystemMessage(content="Extract the programming language and version from user input. Reply with JSON only."),
    # Example 1
    HumanMessage(content="I'm using Python 3.12"),
    AIMessage(content='{"language": "Python", "version": "3.12"}'),
    # Example 2
    HumanMessage(content="Running Node.js 20 LTS"),
    AIMessage(content='{"language": "Node.js", "version": "20"}'),
    # Actual query
    HumanMessage(content="My stack is Ruby 3.3 with Rails 7"),
]

response = llm.invoke(messages)
print(response.content)
# {"language": "Ruby", "version": "3.3"}

Accessing Response Metadata

Python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
response = llm.invoke([HumanMessage(content="What is 2 + 2?")])

# Token usage
print(response.usage_metadata)
# {'input_tokens': 14, 'output_tokens': 1, 'total_tokens': 15}

# Response metadata from the provider
print(response.response_metadata)
# {'model_name': 'gpt-4o-mini', 'finish_reason': 'stop', ...}

# The raw ID of this generation
print(response.id)  # chatcmpl-abc123...

# Content as plain string
print(response.content)  # "4"

Comparing Providers Side by Side

Python
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage

question = [HumanMessage(content="Explain tail call optimization in three sentences.")]

openai_response = ChatOpenAI(model="gpt-4o-mini", temperature=0).invoke(question)
anthropic_response = ChatAnthropic(model="claude-haiku-3-5", temperature=0).invoke(question)

print("=== OpenAI ===")
print(openai_response.content)

print("\n=== Anthropic ===")
print(anthropic_response.content)

Creating a Model Factory

When you want to configure the model from external config:

Python
from langchain_core.language_models import BaseChatModel
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from typing import Literal

def get_chat_model(
    provider: Literal["openai", "anthropic"],
    model: str,
    temperature: float = 0,
) -> BaseChatModel:
    if provider == "openai":
        return ChatOpenAI(model=model, temperature=temperature)
    elif provider == "anthropic":
        return ChatAnthropic(model=model, temperature=temperature)
    else:
        raise ValueError(f"Unknown provider: {provider}")

# Use in a chain
llm = get_chat_model("anthropic", "claude-sonnet-4-6", temperature=0.3)
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)

Summary

Chat models in LangChain follow a uniform interface regardless of provider:

  • invoke(messages) — synchronous single response
  • stream(messages) — token-by-token streaming
  • batch(list_of_messages) — concurrent parallel requests
  • Async variants: ainvoke, astream, abatch

The message types — SystemMessage, HumanMessage, AIMessage — represent the conversation structure. Use SystemMessage for instructions, HumanMessage for user input, and AIMessage for model responses (especially in few-shot examples).

In the next lesson you will learn how to parameterize your prompts with PromptTemplate and ChatPromptTemplate, separating your prompt logic from your chain logic.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.