Model backends
LangExtract does not hardcode a single model. The model_id you pass to
lx.extract selects a provider (the integration that talks to a model), and
different providers run on different infrastructure. This page explains how that
selection works conceptually; for step-by-step setup, see the how-to guides
linked below.
model_id selects a provider by pattern
A provider registers the model-id patterns it handles, and LangExtract routes
each model_id to the provider whose pattern it matches:
- Gemini (the default). Model ids beginning with
geminiroute to Google Gemini. The defaultmodel_idisgemini-3.5-flash, so an extraction with nomodel_idruns on Gemini. - OpenAI. Model ids such as
gpt-4oroute to OpenAI. The integration is an optional extra you install separately. - Ollama (local). Model ids such as
gemma2:2broute to Ollama, which runs models on your own machine with no API key.
Because routing is pattern-based rather than a fixed list, a provider can claim a family of models without LangExtract enumerating each one.
When a model id doesn't match
If a model_id matches no registered pattern, LangExtract can't infer the
provider and raises an error. You then name the provider explicitly through a
configuration object. For the exact patterns, the precedence rules, and the
environment variables each provider reads, see the
API reference §4.
Extending with plugins
Providers are pluggable: a third-party package can register new patterns through an entry point, adding support for more models without changing LangExtract itself. For details, see the API reference §4.
See also
- Use OpenAI models: run an OpenAI model.
- Use local models with Ollama: run a model on your own machine.
- Supply API keys: where each provider looks for credentials.
- API reference §4: exact patterns and environment variables.