Configuring LLM(s)#
This guide explains how to configure DocumentLLM
instances to process documents using various LLM providers.
ContextGem uses LiteLLM under the hood, providing uniform access to a wide range of models. For more information on supported LLMs, see Supported LLMs.
π Basic Configuration#
The minimum configuration for a cloud-based LLM requires the model
parameter and an api_key
:
from contextgem import DocumentLLM
llm = DocumentLLM(
model="openai/gpt-4o-mini", # Format: <provider>/<model_name>
api_key="<your-api-key>",
)
For local models, usually you need to specify the api_base
instead of the API key:
from contextgem import DocumentLLM
local_llm = DocumentLLM(
model="ollama/llama3.1:8b",
api_base="http://localhost:11434", # Default Ollama endpoint
)
π Configuration Parameters#
The DocumentLLM
class accepts the following parameters:
Parameter |
Default Value |
Description |
---|---|---|
|
(Required) |
Model identifier in format |
|
|
API key for authentication. Required for most cloud providers but not for local models. |
|
|
Base URL of the API endpoint. Required for local models and some cloud providers (e.g. Azure OpenAI). |
|
|
Deployment ID for the model. Primarily used with Azure OpenAI. |
|
|
API version. Primarily used with Azure OpenAI. |
|
|
Role type for the LLM. Values: |
|
|
By default, ContextGem sets a default system message that primes the LLM for extraction tasks. This default system message can be found here in the source code. Overriding this is typically only necessary for advanced use cases. |
|
|
Sampling temperature (0.0 to 1.0) controlling response creativity. Lower values produce more predictable outputs, higher values generate more varied responses. |
|
|
Maximum tokens in the generated response (applicable to most models). |
|
|
Maximum tokens for output completions in OpenAI o1/o3/o4 models. |
|
|
Reasoning effort for o1/o3/o4 models. Values: |
|
|
Nucleus sampling value (0.0 to 1.0) controlling output focus/randomness. Lower values make output more deterministic, higher values produce more diverse outputs. |
|
|
Timeout in seconds for LLM API calls. |
|
|
Number of retries when LLM request fails. |
|
|
LLM provider-specific retry count for failed requests. |
|
|
Number of retries when LLM request succeeds but returns invalid data (JSON parsing and validation fails). |
|
|
|
|
|
Indicates whether the LLM is a fallback model. Fallback LLMs are optionally assigned to the primary LLM instance and are used when the primary LLM fails. |
|
|
|
|
|
Language for output text. Values: |
|
|
Seed for random number generation to help produce more consistent outputs across multiple runs. When set to a specific integer value, the LLM will attempt to use this seed for sampling operations. However, deterministic output is still not guaranteed even with the same seed, as other factors may influence the modelβs response. |
|
|
Relevant when concurrency is enabled for extraction tasks. Controls frequency of async LLM API requests for concurrent tasks. Defaults to allowing 3 acquisitions per 10-second period to prevent rate limit issues. See aiolimiter documentation for AsyncLimiter configuration details. See Optimizing for Speed for an example of how to easily set up concurrency for extraction. |
π‘ Advanced Configuration Examples#
π Configuring a Fallback LLM#
You can set up a fallback LLM that will be used if the primary LLM fails:
from contextgem import DocumentLLM
# Primary LLM
primary_llm = DocumentLLM(
model="openai/gpt-4o-mini",
api_key="<your-openai-api-key>",
role="extractor_text", # default role
)
# Fallback LLM
fallback_llm = DocumentLLM(
model="anthropic/claude-3-5-haiku",
api_key="<your-anthropic-api-key>",
role="extractor_text", # Must match the primary LLM's role
is_fallback=True,
)
# Assign fallback LLM to primary
primary_llm.fallback_llm = fallback_llm
# Then use the primary LLM as usual
# document = primary_llm.extract_all(document)
π° Setting Up Cost Tracking#
You can configure pricing details to track costs:
from contextgem import DocumentLLM, LLMPricing
llm = DocumentLLM(
model="openai/gpt-4o-mini",
api_key="<your-openai-api-key>",
pricing_details=LLMPricing(
input_per_1m_tokens=0.150, # Cost per 1M input tokens
output_per_1m_tokens=0.600, # Cost per 1M output tokens
),
)
# Perform some extraction tasks
# Later, you can check the cost
cost_info = llm.get_cost()
π§ Using Model-Specific Parameters#
For OpenAIβs reasoning models (o1/o3/o4), you can set reasoning-specific parameters:
from contextgem import DocumentLLM
llm = DocumentLLM(
model="openai/o3-mini",
api_key="<your-openai-api-key>",
max_completion_tokens=8000, # Specific to o1/o3/o4 models
reasoning_effort="medium", # Optional: "low", "medium", "high"
)
π€π€ LLM Groups#
For complex document processing, you can organize multiple LLMs with different roles into a group:
from contextgem import DocumentLLM, DocumentLLMGroup
# Create LLMs with different roles
text_extractor = DocumentLLM(
model="openai/gpt-4o-mini",
api_key="<your-openai-api-key>",
role="extractor_text",
output_language="adapt",
)
text_reasoner = DocumentLLM(
model="openai/o3-mini",
api_key="<your-openai-api-key>",
role="reasoner_text",
max_completion_tokens=16000,
reasoning_effort="high",
output_language="adapt",
)
# Create a group
llm_group = DocumentLLMGroup(
llms=[text_extractor, text_reasoner],
output_language="adapt", # All LLMs in the group must share the same output language setting
)
# Then use the group as usual
# document = llm_group.extract_all(document)
See a practical example of using an LLM group in π Using a Multi-LLM Pipeline to Extract Data from Several Documents.
π Accessing Usage and Cost Statistics#
You can track input/output token usage and costs:
from contextgem import DocumentLLM
llm = DocumentLLM(
model="anthropic/claude-3-5-haiku",
api_key="<your-anthropic-api-key>",
)
# Perform some extraction tasks
# Get usage statistics
usage_info = llm.get_usage()
# Get cost statistics
cost_info = llm.get_cost()
# Reset usage and cost statistics
llm.reset_usage_and_cost()
# The same methods are available for LLM groups, with optional filtering by LLM role
# usage_info = llm_group.get_usage(llm_role="extractor_text")
# cost_info = llm_group.get_cost(llm_role="extractor_text")
# llm_group.reset_usage_and_cost(llm_role="extractor_text")
The usage statistics include not only token counts but also detailed information about each individual call made to the LLM. You can access the call history, including prompts, responses, and timestamps:
from contextgem import DocumentLLM
llm = DocumentLLM(
model="openai/gpt-4.1",
api_key="<your-openai-api-key>",
)
# Perform some extraction tasks
usage_info = llm.get_usage()
# Access the first usage container in the list (for the primary LLM)
llm_usage = usage_info[0]
# Get detailed call information
for call in llm_usage.usage.calls:
print(f"Prompt: {call.prompt}")
print(f"Response: {call.response}") # original, unprocessed response
print(f"Sent at: {call.timestamp_sent}")
print(f"Received at: {call.timestamp_received}")