LLMs

Contents

LLMs#

Module for handling processing logic using LLMs.

This module provides classes and utilities for interacting with LLMs in document processing workflows. It includes functionality for managing LLM configurations, handling API calls, processing text and image inputs, tracking token usage and costs, and managing rate limits for LLM requests.

The module supports various LLM providers through the litellm library, enabling both text-only and multimodal (vision) capabilities. It implements efficient asynchronous processing patterns and provides detailed usage statistics for monitoring and cost management.

class contextgem.public.llms.DocumentLLMGroup(**data)[source]#

Bases: _GenericLLMProcessor

Represents a group of DocumentLLMs with unique roles for processing document content.

This class manages multiple LLMs assigned to specific roles for text and vision processing. It ensures role compliance and facilitates extraction of aspects and concepts from documents.

Variables:
  • llms – A list of DocumentLLM instances, each with a unique role (e.g., extractor_text, reasoner_text, extractor_vision, reasoner_vision). At least 2 instances with distinct roles are required.

  • output_language – Language for produced output text (justifications, explanations). Values: “en” (always English) or “adapt” (matches document/image language). All LLMs in the group must share the same output_language setting.

Note:

Refer to the DocumentLLM class for more information on constructing LLMs for the group.

Example:
LLM group definition#
from contextgem import DocumentLLM, DocumentLLMGroup

# Create a text extractor LLM with a fallback
text_extractor = DocumentLLM(
    model="openai/gpt-4o-mini",
    api_key="your-openai-api-key",  # Replace with your actual API key
    role="extractor_text",
)

# Create a fallback LLM for the text extractor
text_extractor_fallback = DocumentLLM(
    model="anthropic/claude-3-5-haiku",
    api_key="your-anthropic-api-key",  # Replace with your actual API key
    role="extractor_text",  # Must have the same role as the primary LLM
    is_fallback=True,
)

# Assign the fallback LLM to the primary text extractor
text_extractor.fallback_llm = text_extractor_fallback

# Create a text reasoner LLM
text_reasoner = DocumentLLM(
    model="openai/o3-mini",
    api_key="your-openai-api-key",  # Replace with your actual API key
    role="reasoner_text",  # For more complex tasks that require reasoning
)

# Create a vision extractor LLM
vision_extractor = DocumentLLM(
    model="openai/gpt-4o-mini",
    api_key="your-openai-api-key",  # Replace with your actual API key
    role="extractor_vision",  # For handling images
)

# Create a vision reasoner LLM
vision_reasoner = DocumentLLM(
    model="openai/gpt-4o",
    api_key="your-openai-api-key",
    role="reasoner_vision",  # For more complex vision tasks that require reasoning
)

# Create a DocumentLLMGroup with all four LLMs
llm_group = DocumentLLMGroup(
    llms=[text_extractor, text_reasoner, vision_extractor, vision_reasoner],
    output_language="en",  # All LLMs must have the same output language ("en" is default)
)
# This group will have 5 LLMs: four main ones, with different roles,
# and one fallback LLM for a specific LLM. Each LLM can have a fallback LLM.

# Get usage statistics for the whole group or for a specific role
group_usage = llm_group.get_usage()
text_extractor_usage = llm_group.get_usage(llm_role="extractor_text")

# Get cost statistics for the whole group or for a specific role
all_costs = llm_group.get_cost()
text_extractor_cost = llm_group.get_cost(llm_role="extractor_text")

# Reset usage and cost statistics for the whole group or for a specific role
llm_group.reset_usage_and_cost()
llm_group.reset_usage_and_cost(llm_role="extractor_text")

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

llms: list[DocumentLLM]#
output_language: LanguageRequirement#
property is_group: bool#

Abstract property, to be implemented by subclasses.

Whether the LLM is a single instance or a group.

property list_roles: list[Literal['extractor_text', 'reasoner_text', 'extractor_vision', 'reasoner_vision']]#

Returns a list of all roles assigned to the LLMs in this group.

Returns:

A list of LLM role identifiers

Return type:

list[LLMRoleAny]

group_update_output_language(output_language)[source]#

Updates the output language for all LLMs in the group.

Parameters:

output_language (LanguageRequirement) – The new output language to set for all LLMs

Return type:

None

_eq_deserialized_llm_config(other)[source]#

Custom config equality method to compare this DocumentLLMGroup with a deserialized instance.

Uses the _eq_deserialized_llm_config method of the DocumentLLM class to compare each LLM in the group, including fallbacks, if any.

Parameters:

other (DocumentLLMGroup) – Another DocumentLLMGroup instance to compare with

Returns:

True if the instances are equal, False otherwise

Return type:

bool

get_usage(llm_role=None)[source]#

Retrieves the usage information of the LLMs in the group, filtered by the specified LLM role if provided.

Parameters:

llm_role (Optional[str]) – Optional; A string representing the role of the LLM to filter the usage data. If None, returns usage for all LLMs in the group.

Returns:

A list of usage statistics containers for the specified LLMs and their fallbacks.

Return type:

list[_LLMUsageOutputContainer]

Raises:

ValueError – If no LLM with the specified role exists in the group.

get_cost(llm_role=None)[source]#

Retrieves the accumulated cost information of the LLMs in the group, filtered by the specified LLM role if provided.

Parameters:

llm_role (Optional[str]) – Optional; A string representing the role of the LLM to filter the cost data. If None, returns cost for all LLMs in the group.

Returns:

A list of cost statistics containers for the specified LLMs and their fallbacks.

Return type:

list[_LLMCostOutputContainer]

Raises:

ValueError – If no LLM with the specified role exists in the group.

reset_usage_and_cost(llm_role=None)[source]#

Resets the usage and cost statistics for LLMs in the group.

This method clears accumulated usage and cost data, which is useful when processing multiple documents sequentially and tracking metrics for each document separately.

Parameters:

llm_role (Optional[str]) – Optional; A string representing the role of the LLM to reset statistics for. If None, resets statistics for all LLMs in the group.

Raises:

ValueError – If no LLM with the specified role exists in the group.

Return type:

None

Returns:

None

extract_all(document, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#

Extracts all aspects and concepts from a document and its aspects.

This method performs comprehensive extraction by processing the document for aspects and concepts, then extracting concepts from each aspect. The operation can be configured for concurrent processing and customized extraction parameters.

This is the synchronous version of extract_all_async().

Parameters:
  • document (Document) – The document to analyze.

  • overwrite_existing (bool, optional) – Whether to overwrite already processed aspects and concepts with newly extracted information. Defaults to False.

  • max_items_per_call (int, optional) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool, optional) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int, optional) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).

  • max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images). Relevant only for document-level concepts.

Returns:

The document with extracted aspects and concepts.

Return type:

Document

async extract_all_async(document, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#

Asynchronously extracts all aspects and concepts from a document and its aspects.

This method performs comprehensive extraction by processing the document for aspects and concepts, then extracting concepts from each aspect. The operation can be configured for concurrent processing and customized extraction parameters.

Parameters:
  • document (Document) – The document to analyze.

  • overwrite_existing (bool, optional) – Whether to overwrite already processed aspects and concepts with newly extracted information. Defaults to False.

  • max_items_per_call (int, optional) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool, optional) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int, optional) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).

  • max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images). Relevant only for document-level concepts.

Returns:

The document with extracted aspects and concepts.

Return type:

Document

extract_aspects_from_document(document, from_aspects=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#

Extracts aspects from the provided document using predefined LLMs.

If an aspect instance has extracted_items populated, the reference_paragraphs field will be automatically populated from these items.

This is the synchronous version of extract_aspects_from_document_async().

Parameters:
  • document (Document) – The document from which aspects are to be extracted.

  • from_aspects (Optional[list[Aspect]]) – Existing aspects to use as a base for extraction. If None, uses all document’s aspects.

  • overwrite_existing (bool) – Whether to overwrite already processed aspects with newly extracted information. Defaults to False.

  • max_items_per_call (int) – Maximum items with the same extraction params to process per LLM call. Defaults to 0 (all items in single call). For complex tasks, you should not set a value, to avoid prompt overloading. If concurrency is enabled, defaults to 1 (each item processed separately).

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).

Returns:

List of processed Aspect objects with extracted items.

Return type:

list[Aspect]

async extract_aspects_from_document_async(document, from_aspects=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#

Extracts aspects from the provided document using predefined LLMs asynchronously.

If an aspect instance has extracted_items populated, the reference_paragraphs field will be automatically populated from these items.

Parameters:
  • document (Document) – The document from which aspects are to be extracted.

  • from_aspects (Optional[list[Aspect]]) – Existing aspects to use as a base for extraction. If None, uses all document’s aspects.

  • overwrite_existing (bool) – Whether to overwrite already processed aspects with newly extracted information. Defaults to False.

  • max_items_per_call (int) – Maximum number of items with the same extraction params to process per LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).

Returns:

List of processed Aspect objects with extracted items.

Return type:

list[Aspect]

extract_concepts_from_aspect(aspect, document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#

Extracts concepts associated with a given aspect in a document.

This method processes an aspect to extract related concepts using LLMs. If the aspect has not been previously processed, a ValueError is raised.

This is the synchronous version of extract_concepts_from_aspect_async().

Parameters:
  • aspect (Aspect) – The aspect from which to extract concepts.

  • document (Document) – The document that contains the aspect.

  • from_concepts (Optional[list[_Concept]]) – List of existing concepts to process. Defaults to None.

  • overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.

  • max_items_per_call (int) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).

Returns:

List of processed concept objects.

Return type:

list[_Concept]

async extract_concepts_from_aspect_async(aspect, document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#

Asynchronously extracts concepts from a specified aspect using LLMs.

This method processes an aspect to extract related concepts using LLMs. If the aspect has not been previously processed, a ValueError is raised.

Parameters:
  • aspect (Aspect) – The aspect from which to extract concepts.

  • document (Document) – The document that contains the aspect.

  • from_concepts (Optional[list[_Concept]]) – List of existing concepts to process. Defaults to None.

  • overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.

  • max_items_per_call (int) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).

Returns:

List of processed concept objects.

Return type:

list[_Concept]

extract_concepts_from_document(document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#

Extracts concepts from the provided document using predefined LLMs.

This is the synchronous version of extract_concepts_from_document_async().

Parameters:
  • document (Document) – The document from which concepts are to be extracted.

  • from_concepts (Optional[list[_Concept]]) – Existing concepts to use as a base for extraction. If None, uses all document’s concepts.

  • overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.

  • max_items_per_call (int) – Maximum items with the same extraction params to process per LLM call. Defaults to 0 (all items in single call). For complex tasks, you should not set a value, to avoid prompt overloading. If concurrency is enabled, defaults to 1 (each item processed separately).

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).

  • max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images).

Returns:

List of processed Concept objects with extracted items.

Return type:

list[_Concept]

async extract_concepts_from_document_async(document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#

Extracts concepts from the provided document using predefined LLMs asynchronously.

This method processes a document to extract concepts using configured LLMs.

Parameters:
  • document (Document) – The document from which concepts are to be extracted.

  • from_concepts (Optional[list[_Concept]]) – Existing concepts to use as a base for extraction. If None, uses all document’s concepts.

  • overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False. Defaults to False.

  • max_items_per_call (int) – Maximum number of items with the same extraction params to process per LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).

  • max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images).

Returns:

List of processed Concept objects with extracted items.

Return type:

list[_Concept]

classmethod from_dict(obj_dict)#

Reconstructs an instance of the class from a dictionary representation.

This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.

Parameters:

obj_dict (dict[str, Any]) – Dictionary containing the serialized object data.

Returns:

A new instance of the class with restored attributes.

Return type:

Self

classmethod from_disk(file_path)#

Loads an instance of the class from a JSON file stored on disk.

This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.

Parameters:

file_path (str) – Path to the JSON file to load (must end with ‘.json’).

Returns:

An instance of the class populated with the data from the file.

Return type:

Self

Raises:
  • ValueError – If the file path doesn’t end with ‘.json’.

  • OSError – If there’s an error reading the file.

  • RuntimeError – If deserialization fails.

classmethod from_json(json_string)#

Creates an instance of the class from a JSON string representation.

This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.

Parameters:

json_string (str) – JSON string containing the serialized object data.

Returns:

A new instance of the class with restored state.

Return type:

Self

Raises:

TypeError – If the class name in the serialized data doesn’t match.

to_dict()#

Transforms the current object into a dictionary representation.

Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes

When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.

Returns:

A dictionary representation of the current object with all necessary data for serialization

Return type:

dict[str, Any]

to_disk(file_path)#

Saves the serialized instance to a JSON file at the specified path.

This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.

Parameters:

file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).

Return type:

None

Returns:

None

Raises:
  • ValueError – If the file path doesn’t end with ‘.json’.

  • IOError – If there’s an error during the file writing process.

to_json()#

Converts the object to its JSON string representation.

Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.

Returns:

A JSON string representation of the object.

Return type:

str

class contextgem.public.llms.DocumentLLM(**data)[source]#

Bases: _GenericLLMProcessor

Handles processing documents with a specific LLM.

This class serves as an abstraction for interacting with a LLM. It provides functionality for querying the LLM with text or image inputs, and manages prompt preparation and token usage tracking. The class can be configured with different roles based on the document processing task.

Variables:
  • model – Model identifier in format {model_provider}/{model_name}. See https://docs.litellm.ai/docs/providers for supported providers.

  • deployment_id – Deployment ID for the LLM. Primarily used with Azure OpenAI.

  • api_key – API key for LLM authentication. Not required for local models (e.g., Ollama).

  • api_base – Base URL of the API endpoint.

  • api_version – API version. Primarily used with Azure OpenAI.

  • role – Role type for the LLM (e.g., “extractor_text”, “reasoner_text”, “extractor_vision”, “reasoner_vision”). Defaults to “extractor_text”.

  • system_message – Preparatory system-level message to set context for LLM responses.

  • temperature – Sampling temperature (0.0 to 1.0) controlling response creativity. Lower values produce more predictable outputs, higher values generate more varied responses. Defaults to 0.3.

  • max_tokens – Maximum tokens allowed in the generated response. Defaults to 4096.

  • max_completion_tokens – Maximum token size for output completions in reasoning (CoT-capable) models. Defaults to 16000.

  • reasoning_effort – The effort level for the LLM to reason about the input. Can be set to "low", "medium", or "high". Relevant for reasoning (CoT-capable) models. Defaults to None.

  • top_p – Nucleus sampling value (0.0 to 1.0) controlling output focus/randomness. Lower values make output more deterministic, higher values produce more diverse outputs. Defaults to 0.3.

  • num_retries_failed_request – Number of retries when LLM request fails. Defaults to 3.

  • max_retries_failed_request – LLM provider-specific retry count for failed requests. Defaults to 0.

  • max_retries_invalid_data – Number of retries when LLM returns invalid data. Defaults to 3.

  • timeout – Timeout in seconds for LLM API calls. Defaults to 120 seconds.

  • pricing_details – LLMPricing object with pricing details for cost calculation.

  • is_fallback – Indicates whether the LLM is a fallback model. Defaults to False.

  • fallback_llm – DocumentLLM to use as fallback if current one fails. Must have the same role as the current LLM.

  • output_language – Language for produced output text (justifications, explanations). Can be “en” (English) or “adapt” (adapts to document/image language). Defaults to “en”.

  • async_limiter – Controls frequency of async LLM API requests for concurrent tasks. Defaults to allowing 3 acquisitions per 10-second period to prevent rate limit issues. See mjpieters/aiolimiter for configuration details.

  • seed – Seed for random number generation to help produce more consistent outputs across multiple runs. When set to a specific integer value, the LLM will attempt to use this seed for sampling operations. However, deterministic output is still not guaranteed even with the same seed, as other factors may influence the model’s response. Defaults to None.

Parameters:
  • model (NonEmptyStr)

  • deployment_id (Optional[NonEmptyStr])

  • api_key (Optional[NonEmptyStr])

  • api_base (Optional[NonEmptyStr])

  • api_version (Optional[NonEmptyStr])

  • role (LLMRoleAny)

  • system_message (Optional[NonEmptyStr])

  • temperature (Optional[float])

  • max_tokens (Optional[int])

  • max_completion_tokens (Optional[int])

  • reasoning_effort (Optional[ReasoningEffort])

  • top_p (Optional[float])

  • num_retries_failed_request (Optional[int])

  • max_retries_failed_request (Optional[int])

  • max_retries_invalid_data (Optional[int])

  • timeout (Optional[int])

  • pricing_details (Optional[dict[NonEmptyStr, float]])

  • is_fallback (bool)

  • fallback_llm (Optional[DocumentLLM])

  • output_language (LanguageRequirement)

  • seed (Optional[StrictInt])

Note:

  • LLM groups

    Refer to the DocumentLLMGroup class for more information on constructing LLM groups, which are a collection of LLMs with unique roles, used for complex document processing tasks.

  • LLM role

    The role of an LLM is an abstraction to differentiate between tasks of different complexity. For example, if an aspect/concept is assigned llm_role="extractor_text", it means that the aspect/concept is extracted from the document using the LLM with the role set to “extractor_text”. This helps to channel different tasks to different LLMs, ensuring that the task is handled by the most appropriate model. Usually, domain expertise is required to determine the most appropriate role for a specific aspect/concept. But for simple use cases, you can skip the role assignment completely, in which case the role will default to “extractor_text”.

Example:
LLM definition#
from contextgem import DocumentLLM, LLMPricing

# Create a single LLM for text extraction
text_extractor = DocumentLLM(
    model="openai/gpt-4o-mini",
    api_key="your-api-key",  # Replace with your actual API key
    role="extractor_text",  # Role for text extraction
    pricing_details=LLMPricing(  # optional
        input_per_1m_tokens=0.150, output_per_1m_tokens=0.600
    ),
)

# Create a fallback LLM in case the primary model fails
fallback_text_extractor = DocumentLLM(
    model="anthropic/claude-3-7-sonnet",
    api_key="your-anthropic-api-key",  # Replace with your actual API key
    role="extractor_text",  # must be the same as the role of the primary LLM
    is_fallback=True,
    pricing_details=LLMPricing(  # optional
        input_per_1m_tokens=3.00, output_per_1m_tokens=15.00
    ),
)
# Assign the fallback LLM to the primary LLM
text_extractor.fallback_llm = fallback_text_extractor

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model: NonEmptyStr#
deployment_id: Optional[NonEmptyStr]#
api_key: Optional[NonEmptyStr]#
api_base: Optional[NonEmptyStr]#
api_version: Optional[NonEmptyStr]#
role: LLMRoleAny#
system_message: Optional[NonEmptyStr]#
temperature: Optional[StrictFloat]#
max_tokens: Optional[StrictInt]#
max_completion_tokens: Optional[StrictInt]#
reasoning_effort: Optional[ReasoningEffort]#
top_p: Optional[StrictFloat]#
num_retries_failed_request: Optional[StrictInt]#
max_retries_failed_request: Optional[StrictInt]#
max_retries_invalid_data: Optional[StrictInt]#
timeout: Optional[StrictInt]#
pricing_details: Optional[LLMPricing]#
is_fallback: StrictBool#
fallback_llm: Optional[DocumentLLM]#
output_language: LanguageRequirement#
seed: Optional[StrictInt]#
property async_limiter: AsyncLimiter#
property is_group: bool#

Abstract property, to be implemented by subclasses.

Whether the LLM is a single instance or a group.

property list_roles: list[Literal['extractor_text', 'reasoner_text', 'extractor_vision', 'reasoner_vision']]#

Returns a list containing the role of this LLM.

(For a single LLM, this returns a list with just one element - the LLM’s role. For LLM groups, the method implementation returns roles of all LLMs in the group.)

Returns:

A list containing the role of this LLM.

Return type:

list[LLMRoleAny]

chat(prompt, images=None)[source]#

Synchronously sends a prompt to the LLM and gets a response. For models supporting vision, attach images to the prompt if needed.

This method allows direct interaction with the LLM by submitting your own prompt.

Parameters:
  • prompt (str) – The input prompt to send to the LLM

  • images (Optional[list[Image]]) – Optional list of Image instances for vision queries

Returns:

The LLM’s response

Return type:

str

Raises:
  • ValueError – If the prompt is empty or not a string

  • ValueError – If images parameter is not a list of Image instances

  • ValueError – If images are provided but the model doesn’t support vision

  • RuntimeError – If the LLM call fails and no fallback is available

async chat_async(prompt, images=None)[source]#

Asynchronously sends a prompt to the LLM and gets a response. For models supporting vision, attach images to the prompt if needed.

This method allows direct interaction with the LLM by submitting your own prompt.

Parameters:
  • prompt (str) – The input prompt to send to the LLM

  • images (Optional[list[Image]]) – Optional list of Image instances for vision queries

Returns:

The LLM’s response

Return type:

str

Raises:
  • ValueError – If the prompt is empty or not a string

  • ValueError – If images parameter is not a list of Image instances

  • ValueError – If images are provided but the model doesn’t support vision

  • RuntimeError – If the LLM call fails and no fallback is available

_update_default_prompt(prompt_path, prompt_type)[source]#

For advanced users only!

Update the default Jinja2 prompt template for the LLM.

This method allows you to replace the built-in prompt templates with custom ones for specific extraction types. The framework uses these templates to guide the LLM in extracting structured information from documents.

The custom prompt must be a valid Jinja2 template and include all the necessary variables that are present in the default prompt. Otherwise, the extraction may fail. Default prompts are located under contextgem/internal/prompts/

IMPORTANT NOTES:

The default prompts are complex and specifically designed for various steps of LLM extraction with the framework. Such prompts include the necessary instructions, template variables, nested structures and loops, etc.

Only use custom prompts if you MUST have a deeper customization and adaptation of the default prompts to your specific use case. Otherwise, the default prompts should be sufficient for most use cases.

Use at your own risk!

Parameters:
  • prompt_path (str | Path) – Path to the Jinja2 template file (.j2 extension required)

  • prompt_type (DefaultPromptType) – Type of prompt to update (“aspect” or “concept”)

Return type:

None

_eq_deserialized_llm_config(other)[source]#

Custom config equality method to compare this DocumentLLM with a deserialized instance.

Compares the __dict__ of both instances and performs specific checks for certain attributes that require special handling.

Note that, by default, the reconstructed deserialized DocumentLLM will be only partially equal (==) to the original one, as the api credentials are redacted, and the attached prompt templates, async limiter, and async lock are not serialized and point to different objects in memory post-initialization. Also, usage and cost are reset by default pre-serialization.

Parameters:

other (DocumentLLM) – Another DocumentLLM instance to compare with

Returns:

True if the instances are equal, False otherwise

Return type:

bool

get_usage()[source]#

Retrieves the usage information of the LLM and its fallback LLM if configured.

This method collects token usage statistics for the current LLM instance and its fallback LLM (if configured), providing insights into API consumption.

Returns:

A list of usage statistics containers for the LLM and its fallback.

Return type:

list[_LLMUsageOutputContainer]

get_cost()[source]#

Retrieves the accumulated cost information of the LLM and its fallback LLM if configured.

This method collects cost statistics for the current LLM instance and its fallback LLM (if configured), providing insights into API usage expenses.

Returns:

A list of cost statistics containers for the LLM and its fallback.

Return type:

list[_LLMCostOutputContainer]

extract_all(document, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#

Extracts all aspects and concepts from a document and its aspects.

This method performs comprehensive extraction by processing the document for aspects and concepts, then extracting concepts from each aspect. The operation can be configured for concurrent processing and customized extraction parameters.

This is the synchronous version of extract_all_async().

Parameters:
  • document (Document) – The document to analyze.

  • overwrite_existing (bool, optional) – Whether to overwrite already processed aspects and concepts with newly extracted information. Defaults to False.

  • max_items_per_call (int, optional) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool, optional) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int, optional) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).

  • max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images). Relevant only for document-level concepts.

Returns:

The document with extracted aspects and concepts.

Return type:

Document

async extract_all_async(document, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#

Asynchronously extracts all aspects and concepts from a document and its aspects.

This method performs comprehensive extraction by processing the document for aspects and concepts, then extracting concepts from each aspect. The operation can be configured for concurrent processing and customized extraction parameters.

Parameters:
  • document (Document) – The document to analyze.

  • overwrite_existing (bool, optional) – Whether to overwrite already processed aspects and concepts with newly extracted information. Defaults to False.

  • max_items_per_call (int, optional) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool, optional) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int, optional) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).

  • max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images). Relevant only for document-level concepts.

Returns:

The document with extracted aspects and concepts.

Return type:

Document

extract_aspects_from_document(document, from_aspects=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#

Extracts aspects from the provided document using predefined LLMs.

If an aspect instance has extracted_items populated, the reference_paragraphs field will be automatically populated from these items.

This is the synchronous version of extract_aspects_from_document_async().

Parameters:
  • document (Document) – The document from which aspects are to be extracted.

  • from_aspects (Optional[list[Aspect]]) – Existing aspects to use as a base for extraction. If None, uses all document’s aspects.

  • overwrite_existing (bool) – Whether to overwrite already processed aspects with newly extracted information. Defaults to False.

  • max_items_per_call (int) – Maximum items with the same extraction params to process per LLM call. Defaults to 0 (all items in single call). For complex tasks, you should not set a value, to avoid prompt overloading. If concurrency is enabled, defaults to 1 (each item processed separately).

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).

Returns:

List of processed Aspect objects with extracted items.

Return type:

list[Aspect]

async extract_aspects_from_document_async(document, from_aspects=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#

Extracts aspects from the provided document using predefined LLMs asynchronously.

If an aspect instance has extracted_items populated, the reference_paragraphs field will be automatically populated from these items.

Parameters:
  • document (Document) – The document from which aspects are to be extracted.

  • from_aspects (Optional[list[Aspect]]) – Existing aspects to use as a base for extraction. If None, uses all document’s aspects.

  • overwrite_existing (bool) – Whether to overwrite already processed aspects with newly extracted information. Defaults to False.

  • max_items_per_call (int) – Maximum number of items with the same extraction params to process per LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).

Returns:

List of processed Aspect objects with extracted items.

Return type:

list[Aspect]

extract_concepts_from_aspect(aspect, document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#

Extracts concepts associated with a given aspect in a document.

This method processes an aspect to extract related concepts using LLMs. If the aspect has not been previously processed, a ValueError is raised.

This is the synchronous version of extract_concepts_from_aspect_async().

Parameters:
  • aspect (Aspect) – The aspect from which to extract concepts.

  • document (Document) – The document that contains the aspect.

  • from_concepts (Optional[list[_Concept]]) – List of existing concepts to process. Defaults to None.

  • overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.

  • max_items_per_call (int) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).

Returns:

List of processed concept objects.

Return type:

list[_Concept]

async extract_concepts_from_aspect_async(aspect, document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0)#

Asynchronously extracts concepts from a specified aspect using LLMs.

This method processes an aspect to extract related concepts using LLMs. If the aspect has not been previously processed, a ValueError is raised.

Parameters:
  • aspect (Aspect) – The aspect from which to extract concepts.

  • document (Document) – The document that contains the aspect.

  • from_concepts (Optional[list[_Concept]]) – List of existing concepts to process. Defaults to None.

  • overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.

  • max_items_per_call (int) – Maximum number of items with the same extraction params to process in each LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to include in a single LLM prompt. Defaults to 0 (all paragraphs).

Returns:

List of processed concept objects.

Return type:

list[_Concept]

extract_concepts_from_document(document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#

Extracts concepts from the provided document using predefined LLMs.

This is the synchronous version of extract_concepts_from_document_async().

Parameters:
  • document (Document) – The document from which concepts are to be extracted.

  • from_concepts (Optional[list[_Concept]]) – Existing concepts to use as a base for extraction. If None, uses all document’s concepts.

  • overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False.

  • max_items_per_call (int) – Maximum items with the same extraction params to process per LLM call. Defaults to 0 (all items in single call). For complex tasks, you should not set a value, to avoid prompt overloading. If concurrency is enabled, defaults to 1 (each item processed separately).

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).

  • max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images).

Returns:

List of processed Concept objects with extracted items.

Return type:

list[_Concept]

async extract_concepts_from_document_async(document, from_concepts=None, overwrite_existing=False, max_items_per_call=0, use_concurrency=False, max_paragraphs_to_analyze_per_call=0, max_images_to_analyze_per_call=0)#

Extracts concepts from the provided document using predefined LLMs asynchronously.

This method processes a document to extract concepts using configured LLMs.

Parameters:
  • document (Document) – The document from which concepts are to be extracted.

  • from_concepts (Optional[list[_Concept]]) – Existing concepts to use as a base for extraction. If None, uses all document’s concepts.

  • overwrite_existing (bool) – Whether to overwrite already processed concepts with newly extracted information. Defaults to False. Defaults to False.

  • max_items_per_call (int) – Maximum number of items with the same extraction params to process per LLM call. Defaults to 0 (all items in one call). If concurrency is enabled, defaults to 1. For complex tasks, you should not set a high value, in order to avoid prompt overloading.

  • use_concurrency (bool) – If True, enables concurrent processing of multiple items. Concurrency can considerably reduce processing time, but may cause rate limit errors with LLM providers. Use this option when API rate limits allow for multiple concurrent requests. Defaults to False.

  • max_paragraphs_to_analyze_per_call (int) – Maximum paragraphs to analyze in a single LLM prompt. Defaults to 0 (all paragraphs).

  • max_images_to_analyze_per_call (int, optional) – Maximum images to include in a single LLM prompt. Defaults to 0 (all images).

Returns:

List of processed Concept objects with extracted items.

Return type:

list[_Concept]

classmethod from_dict(obj_dict)#

Reconstructs an instance of the class from a dictionary representation.

This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.

Parameters:

obj_dict (dict[str, Any]) – Dictionary containing the serialized object data.

Returns:

A new instance of the class with restored attributes.

Return type:

Self

classmethod from_disk(file_path)#

Loads an instance of the class from a JSON file stored on disk.

This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.

Parameters:

file_path (str) – Path to the JSON file to load (must end with ‘.json’).

Returns:

An instance of the class populated with the data from the file.

Return type:

Self

Raises:
  • ValueError – If the file path doesn’t end with ‘.json’.

  • OSError – If there’s an error reading the file.

  • RuntimeError – If deserialization fails.

classmethod from_json(json_string)#

Creates an instance of the class from a JSON string representation.

This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.

Parameters:

json_string (str) – JSON string containing the serialized object data.

Returns:

A new instance of the class with restored state.

Return type:

Self

Raises:

TypeError – If the class name in the serialized data doesn’t match.

reset_usage_and_cost()[source]#

Resets the usage and cost statistics for the LLM and its fallback LLM (if configured).

This method clears accumulated usage and cost data, which is useful when processing multiple documents sequentially and tracking metrics for each document separately.

Return type:

None

Returns:

None

to_dict()#

Transforms the current object into a dictionary representation.

Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes

When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.

Returns:

A dictionary representation of the current object with all necessary data for serialization

Return type:

dict[str, Any]

to_disk(file_path)#

Saves the serialized instance to a JSON file at the specified path.

This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.

Parameters:

file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).

Return type:

None

Returns:

None

Raises:
  • ValueError – If the file path doesn’t end with ‘.json’.

  • IOError – If there’s an error during the file writing process.

to_json()#

Converts the object to its JSON string representation.

Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.

Returns:

A JSON string representation of the object.

Return type:

str