LLMs#
Module for handling processing logic using LLMs.
This module provides classes and utilities for interacting with LLMs in document processing workflows. It includes functionality for managing LLM configurations, handling API calls, processing text and image inputs, tracking token usage and costs, and managing rate limits for LLM requests.
The module supports various LLM providers through the litellm library, enabling both text-only and multimodal (vision) capabilities. It implements efficient asynchronous processing patterns and provides detailed usage statistics for monitoring and cost management.
- class contextgem.public.llms.DocumentLLMGroup(**data)[source]#
Bases:
_GenericLLMProcessor
Represents a group of DocumentLLMs with unique roles for processing document content.
This class manages multiple LLMs assigned to specific roles for text and vision processing. It ensures role compliance and facilitates extraction of aspects and concepts from documents.
- Variables:
llms – A list of DocumentLLM instances, each with a unique role (e.g., extractor_text, reasoner_text, extractor_vision, reasoner_vision). At least 2 instances with distinct roles are required.
output_language – Language for produced output text (justifications, explanations). Values: “en” (always English) or “adapt” (matches document/image language). All LLMs in the group must share the same output_language setting.
- Note:
Refer to the
DocumentLLM
class for more information on constructing LLMs for the group.- Example:
- LLM group definition#
from contextgem import DocumentLLM, DocumentLLMGroup # Create a text extractor LLM with a fallback text_extractor = DocumentLLM( model="openai/gpt-4o-mini", api_key="your-openai-api-key", # Replace with your actual API key role="extractor_text", ) # Create a fallback LLM for the text extractor text_extractor_fallback = DocumentLLM( model="anthropic/claude-3-5-haiku", api_key="your-anthropic-api-key", # Replace with your actual API key role="extractor_text", # Must have the same role as the primary LLM is_fallback=True, ) # Assign the fallback LLM to the primary text extractor text_extractor.fallback_llm = text_extractor_fallback # Create a text reasoner LLM text_reasoner = DocumentLLM( model="openai/o3-mini", api_key="your-openai-api-key", # Replace with your actual API key role="reasoner_text", # For more complex tasks that require reasoning ) # Create a vision extractor LLM vision_extractor = DocumentLLM( model="openai/gpt-4o-mini", api_key="your-openai-api-key", # Replace with your actual API key role="extractor_vision", # For handling images ) # Create a vision reasoner LLM vision_reasoner = DocumentLLM( model="openai/gpt-4o", api_key="your-openai-api-key", role="reasoner_vision", # For more complex vision tasks that require reasoning ) # Create a DocumentLLMGroup with all four LLMs llm_group = DocumentLLMGroup( llms=[text_extractor, text_reasoner, vision_extractor, vision_reasoner], output_language="en", # All LLMs must have the same output language ("en" is default) ) # This group will have 5 LLMs: four main ones, with different roles, # and one fallback LLM for a specific LLM. Each LLM can have a fallback LLM. # Get usage statistics for the whole group or for a specific role group_usage = llm_group.get_usage() text_extractor_usage = llm_group.get_usage(llm_role="extractor_text") # Get cost statistics for the whole group or for a specific role all_costs = llm_group.get_cost() text_extractor_cost = llm_group.get_cost(llm_role="extractor_text") # Reset usage and cost statistics for the whole group or for a specific role llm_group.reset_usage_and_cost() llm_group.reset_usage_and_cost(llm_role="extractor_text")
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- llms: list[DocumentLLM]#
- output_language: LanguageRequirement#
- property is_group: bool#
Abstract property, to be implemented by subclasses.
Whether the LLM is a single instance or a group.
- property list_roles: list[Literal['extractor_text', 'reasoner_text', 'extractor_vision', 'reasoner_vision']]#
Returns a list of all roles assigned to the LLMs in this group.
- Returns:
A list of LLM role identifiers
- Return type:
list[LLMRoleAny]
- group_update_output_language(output_language)[source]#
Updates the output language for all LLMs in the group.
- Parameters:
output_language (LanguageRequirement) – The new output language to set for all LLMs
- Return type:
- _eq_deserialized_llm_config(other)[source]#
Custom config equality method to compare this DocumentLLMGroup with a deserialized instance.
Uses the _eq_deserialized_llm_config method of the DocumentLLM class to compare each LLM in the group, including fallbacks, if any.
- Parameters:
other (DocumentLLMGroup) – Another DocumentLLMGroup instance to compare with
- Returns:
True if the instances are equal, False otherwise
- Return type:
- get_usage(llm_role=None)[source]#
Retrieves the usage information of the LLMs in the group, filtered by the specified LLM role if provided.
- Parameters:
llm_role (Optional[str]) – Optional; A string representing the role of the LLM to filter the usage data. If None, returns usage for all LLMs in the group.
- Returns:
A list of usage statistics containers for the specified LLMs and their fallbacks.
- Return type:
list[_LLMUsageOutputContainer]
- Raises:
ValueError – If no LLM with the specified role exists in the group.
- get_cost(llm_role=None)[source]#
Retrieves the accumulated cost information of the LLMs in the group, filtered by the specified LLM role if provided.
- Parameters:
llm_role (Optional[str]) – Optional; A string representing the role of the LLM to filter the cost data. If None, returns cost for all LLMs in the group.
- Returns:
A list of cost statistics containers for the specified LLMs and their fallbacks.
- Return type:
list[_LLMCostOutputContainer]
- Raises:
ValueError – If no LLM with the specified role exists in the group.
- reset_usage_and_cost(llm_role=None)[source]#
Resets the usage and cost statistics for LLMs in the group.
This method clears accumulated usage and cost data, which is useful when processing multiple documents sequentially and tracking metrics for each document separately.
- Parameters:
llm_role (Optional[str]) – Optional; A string representing the role of the LLM to reset statistics for. If None, resets statistics for all LLMs in the group.
- Raises:
ValueError – If no LLM with the specified role exists in the group.
- Return type:
- Returns:
None
- extract_all(document, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#
Extracts all aspects and concepts from a document and its aspects.
This method performs comprehensive extraction by processing the document for aspects and concepts, then extracting concepts from each aspect. The operation can be configured for concurrent processing and customized extraction parameters.
This is the synchronous version of extract_all_async().
- Parameters:
document (Document) – The document to analyze.
overwrite_existing (bool, optional) – Whether to overwrite already processed aspects and concepts with newly extracted information. Defaults to False.
max_items_per_call (int, optional) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool, optional) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int, optional) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).
max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images). Relevant only for document-level concepts.
- Returns:
The document with extracted aspects and concepts.
- Return type:
- async extract_all_async(document, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#
Asynchronously extracts all aspects and concepts from a document and its aspects.
This method performs comprehensive extraction by processing the document for aspects and concepts, then extracting concepts from each aspect. The operation can be configured for concurrent processing and customized extraction parameters.
- Parameters:
document (Document) – The document to analyze.
overwrite_existing (bool, optional) – Whether to overwrite already processed aspects and concepts with newly extracted information. Defaults to False.
max_items_per_call (int, optional) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool, optional) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int, optional) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).
max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images). Relevant only for document-level concepts.
- Returns:
The document with extracted aspects and concepts.
- Return type:
- extract_aspects_from_document(document, from_aspects=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#
Extracts aspects from the provided document using predefined LLMs.
If an aspect instance has
extracted_items
populated, thereference_paragraphs
field will be automatically populated from these items.This is the synchronous version of extract_aspects_from_document_async().
- Parameters:
document (Document) – The document from which aspects are to be extracted.
from_aspects (Optional[list[Aspect]]) – Existing aspects to use as a base for extraction. If None, uses all document’s aspects.
overwrite_existing (bool) – Whether to overwrite already processed aspects with newly extracted information. Defaults to False.
max_items_per_call (int) – Maximum items with the same extraction params to process per LLM call. Defaults to 0 (all items in single call). For complex tasks, you should not set a value, to avoid prompt overloading. If concurrency is enabled, defaults to 1 (each item processed separately).
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).
- Returns:
List of processed Aspect objects with extracted items.
- Return type:
- async extract_aspects_from_document_async(document, from_aspects=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#
Extracts aspects from the provided document using predefined LLMs asynchronously.
If an aspect instance has
extracted_items
populated, thereference_paragraphs
field will be automatically populated from these items.- Parameters:
document (Document) – The document from which aspects are to be extracted.
from_aspects (Optional[list[Aspect]]) – Existing aspects to use as a base for extraction. If None, uses all document’s aspects.
overwrite_existing (bool) – Whether to overwrite already processed aspects with newly extracted information. Defaults to False.
max_items_per_call (int) – Maximum number of items with the same extraction params to process per LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).
- Returns:
List of processed Aspect objects with extracted items.
- Return type:
- extract_concepts_from_aspect(aspect, document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#
Extracts concepts associated with a given aspect in a document.
This method processes an aspect to extract related concepts using LLMs. If the aspect has not been previously processed, a ValueError is raised.
This is the synchronous version of extract_concepts_from_aspect_async().
- Parameters:
aspect (Aspect) – The aspect from which to extract concepts.
document (Document) – The document that contains the aspect.
from_concepts (Optional[list[_Concept]]) – List of existing concepts to process. Defaults to None.
overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.
max_items_per_call (int) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).
- Returns:
List of processed concept objects.
- Return type:
list[_Concept]
- async extract_concepts_from_aspect_async(aspect, document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#
Asynchronously extracts concepts from a specified aspect using LLMs.
This method processes an aspect to extract related concepts using LLMs. If the aspect has not been previously processed, a ValueError is raised.
- Parameters:
aspect (Aspect) – The aspect from which to extract concepts.
document (Document) – The document that contains the aspect.
from_concepts (Optional[list[_Concept]]) – List of existing concepts to process. Defaults to None.
overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.
max_items_per_call (int) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).
- Returns:
List of processed concept objects.
- Return type:
list[_Concept]
- extract_concepts_from_document(document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#
Extracts concepts from the provided document using predefined LLMs.
This is the synchronous version of extract_concepts_from_document_async().
- Parameters:
document (Document) – The document from which concepts are to be extracted.
from_concepts (Optional[list[_Concept]]) – Existing concepts to use as a base for extraction. If None, uses all document’s concepts.
overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.
max_items_per_call (int) – Maximum items with the same extraction params to process per LLM call. Defaults to 0 (all items in single call). For complex tasks, you should not set a value, to avoid prompt overloading. If concurrency is enabled, defaults to 1 (each item processed separately).
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).
max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images).
- Returns:
List of processed Concept objects with extracted items.
- Return type:
list[_Concept]
- async extract_concepts_from_document_async(document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#
Extracts concepts from the provided document using predefined LLMs asynchronously.
This method processes a document to extract concepts using configured LLMs.
- Parameters:
document (Document) – The document from which concepts are to be extracted.
from_concepts (Optional[list[_Concept]]) – Existing concepts to use as a base for extraction. If None, uses all document’s concepts.
overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False. Defaults to False.
max_items_per_call (int) – Maximum number of items with the same extraction params to process per LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).
max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images).
- Returns:
List of processed Concept objects with extracted items.
- Return type:
list[_Concept]
- classmethod from_dict(obj_dict)#
Reconstructs an instance of the class from a dictionary representation.
This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.
- classmethod from_disk(file_path)#
Loads an instance of the class from a JSON file stored on disk.
This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.
- Parameters:
file_path (str) – Path to the JSON file to load (must end with ‘.json’).
- Returns:
An instance of the class populated with the data from the file.
- Return type:
Self
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
OSError – If there’s an error reading the file.
RuntimeError – If deserialization fails.
- classmethod from_json(json_string)#
Creates an instance of the class from a JSON string representation.
This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.
- to_dict()#
Transforms the current object into a dictionary representation.
Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes
When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.
- to_disk(file_path)#
Saves the serialized instance to a JSON file at the specified path.
This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.
- Parameters:
file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).
- Return type:
- Returns:
None
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
IOError – If there’s an error during the file writing process.
- class contextgem.public.llms.DocumentLLM(**data)[source]#
Bases:
_GenericLLMProcessor
Handles processing documents with a specific LLM.
This class serves as an abstraction for interacting with a LLM. It provides functionality for querying the LLM with text or image inputs, and manages prompt preparation and token usage tracking. The class can be configured with different roles based on the document processing task.
- Variables:
model – Model identifier in format {model_provider}/{model_name}. See https://docs.litellm.ai/docs/providers for supported providers.
deployment_id – Deployment ID for the LLM. Primarily used with Azure OpenAI.
api_key – API key for LLM authentication. Not required for local models (e.g., Ollama).
api_base – Base URL of the API endpoint.
api_version – API version. Primarily used with Azure OpenAI.
role – Role type for the LLM (e.g., “extractor_text”, “reasoner_text”, “extractor_vision”, “reasoner_vision”). Defaults to “extractor_text”.
system_message – Preparatory system-level message to set context for LLM responses.
temperature – Sampling temperature (0.0 to 1.0) controlling response creativity. Lower values produce more predictable outputs, higher values generate more varied responses. Defaults to 0.3.
max_tokens – Maximum tokens allowed in the generated response. Defaults to 4096.
max_completion_tokens – Maximum token size for output completions in reasoning (CoT-capable) models. Defaults to 16000.
reasoning_effort – The effort level for the LLM to reason about the input. Can be set to
"low"
,"medium"
, or"high"
. Relevant for reasoning (CoT-capable) models. Defaults to None.top_p – Nucleus sampling value (0.0 to 1.0) controlling output focus/randomness. Lower values make output more deterministic, higher values produce more diverse outputs. Defaults to 0.3.
num_retries_failed_request – Number of retries when LLM request fails. Defaults to 3.
max_retries_failed_request – LLM provider-specific retry count for failed requests. Defaults to 0.
max_retries_invalid_data – Number of retries when LLM returns invalid data. Defaults to 3.
timeout – Timeout in seconds for LLM API calls. Defaults to 120 seconds.
pricing_details – LLMPricing object with pricing details for cost calculation.
is_fallback – Indicates whether the LLM is a fallback model. Defaults to False.
fallback_llm – DocumentLLM to use as fallback if current one fails. Must have the same role as the current LLM.
output_language – Language for produced output text (justifications, explanations). Can be “en” (English) or “adapt” (adapts to document/image language). Defaults to “en”.
async_limiter – Controls frequency of async LLM API requests for concurrent tasks. Defaults to allowing 3 acquisitions per 10-second period to prevent rate limit issues. See mjpieters/aiolimiter for configuration details.
seed – Seed for random number generation to help produce more consistent outputs across multiple runs. When set to a specific integer value, the LLM will attempt to use this seed for sampling operations. However, deterministic output is still not guaranteed even with the same seed, as other factors may influence the model’s response. Defaults to None.
- Parameters:
model (NonEmptyStr)
deployment_id (Optional[NonEmptyStr])
api_key (Optional[NonEmptyStr])
api_base (Optional[NonEmptyStr])
api_version (Optional[NonEmptyStr])
role (LLMRoleAny)
system_message (Optional[NonEmptyStr])
temperature (Optional[float])
max_tokens (Optional[int])
max_completion_tokens (Optional[int])
reasoning_effort (Optional[ReasoningEffort])
top_p (Optional[float])
num_retries_failed_request (Optional[int])
max_retries_failed_request (Optional[int])
max_retries_invalid_data (Optional[int])
timeout (Optional[int])
is_fallback (bool)
fallback_llm (Optional[DocumentLLM])
output_language (LanguageRequirement)
seed (Optional[StrictInt])
Note:
- LLM groups
Refer to the
DocumentLLMGroup
class for more information on constructing LLM groups, which are a collection of LLMs with unique roles, used for complex document processing tasks.
- LLM role
The
role
of an LLM is an abstraction to differentiate between tasks of different complexity. For example, if an aspect/concept is assignedllm_role="extractor_text"
, it means that the aspect/concept is extracted from the document using the LLM with therole
set to “extractor_text”. This helps to channel different tasks to different LLMs, ensuring that the task is handled by the most appropriate model. Usually, domain expertise is required to determine the most appropriate role for a specific aspect/concept. But for simple use cases, you can skip the role assignment completely, in which case therole
will default to “extractor_text”.
- Example:
- LLM definition#
from contextgem import DocumentLLM, LLMPricing # Create a single LLM for text extraction text_extractor = DocumentLLM( model="openai/gpt-4o-mini", api_key="your-api-key", # Replace with your actual API key role="extractor_text", # Role for text extraction pricing_details=LLMPricing( # optional input_per_1m_tokens=0.150, output_per_1m_tokens=0.600 ), ) # Create a fallback LLM in case the primary model fails fallback_text_extractor = DocumentLLM( model="anthropic/claude-3-7-sonnet", api_key="your-anthropic-api-key", # Replace with your actual API key role="extractor_text", # must be the same as the role of the primary LLM is_fallback=True, pricing_details=LLMPricing( # optional input_per_1m_tokens=3.00, output_per_1m_tokens=15.00 ), ) # Assign the fallback LLM to the primary LLM text_extractor.fallback_llm = fallback_text_extractor
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model: NonEmptyStr#
- deployment_id: Optional[NonEmptyStr]#
- api_key: Optional[NonEmptyStr]#
- api_base: Optional[NonEmptyStr]#
- api_version: Optional[NonEmptyStr]#
- role: LLMRoleAny#
- system_message: Optional[NonEmptyStr]#
- temperature: Optional[StrictFloat]#
- max_tokens: Optional[StrictInt]#
- max_completion_tokens: Optional[StrictInt]#
- reasoning_effort: Optional[ReasoningEffort]#
- top_p: Optional[StrictFloat]#
- num_retries_failed_request: Optional[StrictInt]#
- max_retries_failed_request: Optional[StrictInt]#
- max_retries_invalid_data: Optional[StrictInt]#
- timeout: Optional[StrictInt]#
- pricing_details: Optional[LLMPricing]#
- is_fallback: StrictBool#
- fallback_llm: Optional[DocumentLLM]#
- output_language: LanguageRequirement#
- seed: Optional[StrictInt]#
- property async_limiter: AsyncLimiter#
- property is_group: bool#
Abstract property, to be implemented by subclasses.
Whether the LLM is a single instance or a group.
- property list_roles: list[Literal['extractor_text', 'reasoner_text', 'extractor_vision', 'reasoner_vision']]#
Returns a list containing the role of this LLM.
(For a single LLM, this returns a list with just one element - the LLM’s role. For LLM groups, the method implementation returns roles of all LLMs in the group.)
- Returns:
A list containing the role of this LLM.
- Return type:
list[LLMRoleAny]
- chat(prompt, images=None)[source]#
Synchronously sends a prompt to the LLM and gets a response. For models supporting vision, attach images to the prompt if needed.
This method allows direct interaction with the LLM by submitting your own prompt.
- Parameters:
- Returns:
The LLM’s response
- Return type:
- Raises:
ValueError – If the prompt is empty or not a string
ValueError – If images parameter is not a list of Image instances
ValueError – If images are provided but the model doesn’t support vision
RuntimeError – If the LLM call fails and no fallback is available
- async chat_async(prompt, images=None)[source]#
Asynchronously sends a prompt to the LLM and gets a response. For models supporting vision, attach images to the prompt if needed.
This method allows direct interaction with the LLM by submitting your own prompt.
- Parameters:
- Returns:
The LLM’s response
- Return type:
- Raises:
ValueError – If the prompt is empty or not a string
ValueError – If images parameter is not a list of Image instances
ValueError – If images are provided but the model doesn’t support vision
RuntimeError – If the LLM call fails and no fallback is available
- _update_default_prompt(prompt_path, prompt_type)[source]#
For advanced users only!
Update the default Jinja2 prompt template for the LLM.
This method allows you to replace the built-in prompt templates with custom ones for specific extraction types. The framework uses these templates to guide the LLM in extracting structured information from documents.
The custom prompt must be a valid Jinja2 template and include all the necessary variables that are present in the default prompt. Otherwise, the extraction may fail. Default prompts are located under
contextgem/internal/prompts/
IMPORTANT NOTES:
The default prompts are complex and specifically designed for various steps of LLM extraction with the framework. Such prompts include the necessary instructions, template variables, nested structures and loops, etc.
Only use custom prompts if you MUST have a deeper customization and adaptation of the default prompts to your specific use case. Otherwise, the default prompts should be sufficient for most use cases.
Use at your own risk!
- _eq_deserialized_llm_config(other)[source]#
Custom config equality method to compare this DocumentLLM with a deserialized instance.
Compares the __dict__ of both instances and performs specific checks for certain attributes that require special handling.
Note that, by default, the reconstructed deserialized DocumentLLM will be only partially equal (==) to the original one, as the api credentials are redacted, and the attached prompt templates, async limiter, and async lock are not serialized and point to different objects in memory post-initialization. Also, usage and cost are reset by default pre-serialization.
- Parameters:
other (DocumentLLM) – Another DocumentLLM instance to compare with
- Returns:
True if the instances are equal, False otherwise
- Return type:
- get_usage()[source]#
Retrieves the usage information of the LLM and its fallback LLM if configured.
This method collects token usage statistics for the current LLM instance and its fallback LLM (if configured), providing insights into API consumption.
- Returns:
A list of usage statistics containers for the LLM and its fallback.
- Return type:
list[_LLMUsageOutputContainer]
- get_cost()[source]#
Retrieves the accumulated cost information of the LLM and its fallback LLM if configured.
This method collects cost statistics for the current LLM instance and its fallback LLM (if configured), providing insights into API usage expenses.
- Returns:
A list of cost statistics containers for the LLM and its fallback.
- Return type:
list[_LLMCostOutputContainer]
- extract_all(document, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#
Extracts all aspects and concepts from a document and its aspects.
This method performs comprehensive extraction by processing the document for aspects and concepts, then extracting concepts from each aspect. The operation can be configured for concurrent processing and customized extraction parameters.
This is the synchronous version of extract_all_async().
- Parameters:
document (Document) – The document to analyze.
overwrite_existing (bool, optional) – Whether to overwrite already processed aspects and concepts with newly extracted information. Defaults to False.
max_items_per_call (int, optional) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool, optional) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int, optional) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).
max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images). Relevant only for document-level concepts.
- Returns:
The document with extracted aspects and concepts.
- Return type:
- async extract_all_async(document, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#
Asynchronously extracts all aspects and concepts from a document and its aspects.
This method performs comprehensive extraction by processing the document for aspects and concepts, then extracting concepts from each aspect. The operation can be configured for concurrent processing and customized extraction parameters.
- Parameters:
document (Document) – The document to analyze.
overwrite_existing (bool, optional) – Whether to overwrite already processed aspects and concepts with newly extracted information. Defaults to False.
max_items_per_call (int, optional) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool, optional) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int, optional) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).
max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images). Relevant only for document-level concepts.
- Returns:
The document with extracted aspects and concepts.
- Return type:
- extract_aspects_from_document(document, from_aspects=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#
Extracts aspects from the provided document using predefined LLMs.
If an aspect instance has
extracted_items
populated, thereference_paragraphs
field will be automatically populated from these items.This is the synchronous version of extract_aspects_from_document_async().
- Parameters:
document (Document) – The document from which aspects are to be extracted.
from_aspects (Optional[list[Aspect]]) – Existing aspects to use as a base for extraction. If None, uses all document’s aspects.
overwrite_existing (bool) – Whether to overwrite already processed aspects with newly extracted information. Defaults to False.
max_items_per_call (int) – Maximum items with the same extraction params to process per LLM call. Defaults to 0 (all items in single call). For complex tasks, you should not set a value, to avoid prompt overloading. If concurrency is enabled, defaults to 1 (each item processed separately).
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).
- Returns:
List of processed Aspect objects with extracted items.
- Return type:
- async extract_aspects_from_document_async(document, from_aspects=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#
Extracts aspects from the provided document using predefined LLMs asynchronously.
If an aspect instance has
extracted_items
populated, thereference_paragraphs
field will be automatically populated from these items.- Parameters:
document (Document) – The document from which aspects are to be extracted.
from_aspects (Optional[list[Aspect]]) – Existing aspects to use as a base for extraction. If None, uses all document’s aspects.
overwrite_existing (bool) – Whether to overwrite already processed aspects with newly extracted information. Defaults to False.
max_items_per_call (int) – Maximum number of items with the same extraction params to process per LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).
- Returns:
List of processed Aspect objects with extracted items.
- Return type:
- extract_concepts_from_aspect(aspect, document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#
Extracts concepts associated with a given aspect in a document.
This method processes an aspect to extract related concepts using LLMs. If the aspect has not been previously processed, a ValueError is raised.
This is the synchronous version of extract_concepts_from_aspect_async().
- Parameters:
aspect (Aspect) – The aspect from which to extract concepts.
document (Document) – The document that contains the aspect.
from_concepts (Optional[list[_Concept]]) – List of existing concepts to process. Defaults to None.
overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.
max_items_per_call (int) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).
- Returns:
List of processed concept objects.
- Return type:
list[_Concept]
- async extract_concepts_from_aspect_async(aspect, document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#
Asynchronously extracts concepts from a specified aspect using LLMs.
This method processes an aspect to extract related concepts using LLMs. If the aspect has not been previously processed, a ValueError is raised.
- Parameters:
aspect (Aspect) – The aspect from which to extract concepts.
document (Document) – The document that contains the aspect.
from_concepts (Optional[list[_Concept]]) – List of existing concepts to process. Defaults to None.
overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.
max_items_per_call (int) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).
- Returns:
List of processed concept objects.
- Return type:
list[_Concept]
- extract_concepts_from_document(document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#
Extracts concepts from the provided document using predefined LLMs.
This is the synchronous version of extract_concepts_from_document_async().
- Parameters:
document (Document) – The document from which concepts are to be extracted.
from_concepts (Optional[list[_Concept]]) – Existing concepts to use as a base for extraction. If None, uses all document’s concepts.
overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.
max_items_per_call (int) – Maximum items with the same extraction params to process per LLM call. Defaults to 0 (all items in single call). For complex tasks, you should not set a value, to avoid prompt overloading. If concurrency is enabled, defaults to 1 (each item processed separately).
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).
max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images).
- Returns:
List of processed Concept objects with extracted items.
- Return type:
list[_Concept]
- async extract_concepts_from_document_async(document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#
Extracts concepts from the provided document using predefined LLMs asynchronously.
This method processes a document to extract concepts using configured LLMs.
- Parameters:
document (Document) – The document from which concepts are to be extracted.
from_concepts (Optional[list[_Concept]]) – Existing concepts to use as a base for extraction. If None, uses all document’s concepts.
overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False. Defaults to False.
max_items_per_call (int) – Maximum number of items with the same extraction params to process per LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.
use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.
max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).
max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images).
- Returns:
List of processed Concept objects with extracted items.
- Return type:
list[_Concept]
- classmethod from_dict(obj_dict)#
Reconstructs an instance of the class from a dictionary representation.
This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.
- classmethod from_disk(file_path)#
Loads an instance of the class from a JSON file stored on disk.
This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.
- Parameters:
file_path (str) – Path to the JSON file to load (must end with ‘.json’).
- Returns:
An instance of the class populated with the data from the file.
- Return type:
Self
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
OSError – If there’s an error reading the file.
RuntimeError – If deserialization fails.
- classmethod from_json(json_string)#
Creates an instance of the class from a JSON string representation.
This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.
- reset_usage_and_cost()[source]#
Resets the usage and cost statistics for the LLM and its fallback LLM (if configured).
This method clears accumulated usage and cost data, which is useful when processing multiple documents sequentially and tracking metrics for each document separately.
- Return type:
- Returns:
None
- to_dict()#
Transforms the current object into a dictionary representation.
Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes
When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.
- to_disk(file_path)#
Saves the serialized instance to a JSON file at the specified path.
This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.
- Parameters:
file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).
- Return type:
- Returns:
None
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
IOError – If there’s an error during the file writing process.