Skip to main content
Open on GitHub

Outlines

Outlines is a Python library for constrained language generation. It provides a unified interface to various language models and allows for structured generation using techniques like regex matching, type constraints, JSON schemas, and context-free grammars.

Outlines supports multiple backends, including:

  • Hugging Face Transformers
  • llama.cpp
  • vLLM
  • MLX

This integration allows you to use Outlines models with LangChain, providing both LLM and chat model interfaces.

Installation and Setup

To use Outlines with LangChain, you'll need to install the Outlines library:

pip install outlines

Depending on the backend you choose, you may need to install additional dependencies:

  • For Transformers: pip install transformers torch datasets
  • For llama.cpp: pip install llama-cpp-python
  • For vLLM: pip install vllm
  • For MLX: pip install mlx

LLM

To use Outlines as an LLM in LangChain, you can use the Outlines class:

from langchain_community.llms import Outlines
API Reference:Outlines

Chat Models

To use Outlines as a chat model in LangChain, you can use the ChatOutlines class:

from langchain_community.chat_models import ChatOutlines
API Reference:ChatOutlines

Model Configuration

Both Outlines and ChatOutlines classes share similar configuration options:

model = Outlines(
model="meta-llama/Llama-2-7b-chat-hf", # Model identifier
backend="transformers", # Backend to use (transformers, llamacpp, vllm, or mlxlm)
max_tokens=256, # Maximum number of tokens to generate
stop=["\n"], # Optional list of stop strings
streaming=True, # Whether to stream the output
# Additional parameters for structured generation:
regex=None,
type_constraints=None,
json_schema=None,
grammar=None,
# Additional model parameters:
model_kwargs={"temperature": 0.7}
)

Model Identifier

The model parameter can be:

  • A Hugging Face model name (e.g., "meta-llama/Llama-2-7b-chat-hf")
  • A local path to a model
  • For GGUF models, the format is "repo_id/file_name" (e.g., "TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf")

Backend Options

The backend parameter specifies which backend to use:

  • "transformers": For Hugging Face Transformers models (default)
  • "llamacpp": For GGUF models using llama.cpp
  • "transformers_vision": For vision-language models (e.g., LLaVA)
  • "vllm": For models using the vLLM library
  • "mlxlm": For models using the MLX framework

Structured Generation

Outlines provides several methods for structured generation:

  1. Regex Matching:

    model = Outlines(
    model="meta-llama/Llama-2-7b-chat-hf",
    regex=r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
    )

    This will ensure the generated text matches the specified regex pattern (in this case, a valid IP address).

  2. Type Constraints:

    model = Outlines(
    model="meta-llama/Llama-2-7b-chat-hf",
    type_constraints=int
    )

    This restricts the output to valid Python types (int, float, bool, datetime.date, datetime.time, datetime.datetime).

  3. JSON Schema:

    from pydantic import BaseModel

    class Person(BaseModel):
    name: str
    age: int

    model = Outlines(
    model="meta-llama/Llama-2-7b-chat-hf",
    json_schema=Person
    )

    This ensures the generated output adheres to the specified JSON schema or Pydantic model.

  4. Context-Free Grammar:

    model = Outlines(
    model="meta-llama/Llama-2-7b-chat-hf",
    grammar="""
    ?start: expression
    ?expression: term (("+" | "-") term)*
    ?term: factor (("*" | "/") factor)*
    ?factor: NUMBER | "-" factor | "(" expression ")"
    %import common.NUMBER
    """
    )

    This generates text that adheres to the specified context-free grammar in EBNF format.

Usage Examples

LLM Example

from langchain_community.llms import Outlines

llm = Outlines(model="meta-llama/Llama-2-7b-chat-hf", max_tokens=100)
result = llm.invoke("Tell me a short story about a robot.")
print(result)
API Reference:Outlines

Chat Model Example

from langchain_community.chat_models import ChatOutlines
from langchain_core.messages import HumanMessage, SystemMessage

chat = ChatOutlines(model="meta-llama/Llama-2-7b-chat-hf", max_tokens=100)
messages = [
SystemMessage(content="You are a helpful AI assistant."),
HumanMessage(content="What's the capital of France?")
]
result = chat.invoke(messages)
print(result.content)

Streaming Example

from langchain_community.chat_models import ChatOutlines
from langchain_core.messages import HumanMessage

chat = ChatOutlines(model="meta-llama/Llama-2-7b-chat-hf", streaming=True)
for chunk in chat.stream("Tell me a joke about programming."):
print(chunk.content, end="", flush=True)
print()
API Reference:ChatOutlines | HumanMessage

Structured Output Example

from langchain_community.llms import Outlines
from pydantic import BaseModel

class MovieReview(BaseModel):
title: str
rating: int
summary: str

llm = Outlines(
model="meta-llama/Llama-2-7b-chat-hf",
json_schema=MovieReview
)
result = llm.invoke("Write a short review for the movie 'Inception'.")
print(result)
API Reference:Outlines

Additional Features

Tokenizer Access

You can access the underlying tokenizer for the model:

tokenizer = llm.tokenizer
encoded = tokenizer.encode("Hello, world!")
decoded = tokenizer.decode(encoded)

Was this page helpful?