Quickstart

This quickstart walks you from an empty environment to a working, highlighted visualization of extracted entities. It uses Google Gemini, the default model backend.

Prerequisites

Python 3.10 or newer.
A Gemini API key for cloud models. (Local models via Ollama don't need a key; see Model backends.)

1. Install

pip install langextract

2. Set your API key

LangExtract reads your key from the LANGEXTRACT_API_KEY environment variable.

export LANGEXTRACT_API_KEY="your-api-key-here"

You can get a key from Google AI Studio. See Supply API keys for the other ways to provide credentials, including a .env file and provider-specific variables.

3. Define the task and one example

Two things drive an extraction: a prompt that says what to pull out, and at least one example that shows the model the shape you want. Examples are required: extract raises a ValueError without them.

import langextract as lx
import textwrap

# Describe what to extract.
prompt = textwrap.dedent("""\
    Extract characters, emotions, and relationships in order of appearance.
    Use exact text for extractions. Do not paraphrase or overlap entities.
    Provide meaningful attributes for each entity to add context.""")

# Show one high-quality example.
examples = [
    lx.data.ExampleData(
        text="ROMEO. But soft! What light through yonder window breaks? It is the east, and Juliet is the sun.",
        extractions=[
            lx.data.Extraction(
                extraction_class="character",
                extraction_text="ROMEO",
                attributes={"emotional_state": "wonder"},
            ),
            lx.data.Extraction(
                extraction_class="emotion",
                extraction_text="But soft!",
                attributes={"feeling": "gentle awe"},
            ),
            lx.data.Extraction(
                extraction_class="relationship",
                extraction_text="Juliet is the sun",
                attributes={"type": "metaphor"},
            ),
        ],
    )
]

Make each example verbatim

Each extraction_text should be copied exactly from the example's text, in order of appearance, with no overlaps. LangExtract checks this and emits prompt-alignment warnings by default when examples don't follow the pattern. Resolving those warnings meaningfully improves results.

4. Run the extraction

input_text = "Lady Juliet gazed longingly at the stars, her heart aching for Romeo"

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-3.5-flash",
)

result is an AnnotatedDocument. Its .extractions is a list of Extraction objects, each with the matched text, its character span in the source, and any attributes the model assigned. See Work with results for the full flow.

5. Save and visualize

Save the result to JSONL, then generate a self-contained interactive HTML file.

# Save to JSONL.
lx.io.save_annotated_documents(
    [result],
    output_name="extraction_results.jsonl",
    output_dir=".",
)

# Build the visualization from the file.
html_content = lx.visualize("extraction_results.jsonl")

with open("visualization.html", "w") as f:
    # In a notebook, visualize() returns an HTML object with a .data attribute;
    # in a plain script it returns the HTML string directly.
    if hasattr(html_content, "data"):
        f.write(html_content.data)
    else:
        f.write(html_content)

Open visualization.html in a browser to step through every extraction highlighted in its original context.

What's next

Process a long document from a URL, with multiple passes and parallel workers. See the long-document workflow.
Use a different model: OpenAI or a local model via Ollama.
Tune your prompt and examples: Write prompts & examples.

Prerequisites​

1. Install​

2. Set your API key​

3. Define the task and one example​

4. Run the extraction​

5. Save and visualize​

What's next​