Concepts#
Module for handling concepts at aspect and document levels.
This module provides classes for defining different types of concepts that can be extracted from documents and aspects. Concepts represent specific pieces of information to be identified and extracted by LLMs, such as strings, numbers, boolean values, JSON objects, and ratings.
Each concept type has specific properties and behaviors tailored to the kind of data it represents, including validation rules, extraction methods, and reference handling. Concepts can be attached to documents or aspects and can include examples, justifications, and references to the source text.
- class contextgem.public.concepts.StringConcept(**data)[source]#
Bases:
_Concept
A concept model for string-based information extraction from documents and aspects.
This class provides functionality for defining, extracting, and managing string data as conceptual entities within documents or aspects.
- Variables:
name – The name of the concept (non-empty string, stripped).
description – A brief description of the concept (non-empty string, stripped).
examples – Example strings illustrating the concept usage.
llm_role – The role of the LLM responsible for extracting the concept (“extractor_text”, “reasoner_text”, “extractor_vision”, “reasoner_vision”). Defaults to “extractor_text”.
add_justifications – Whether to include justifications for extracted items.
justification_depth – Justification detail level. Defaults to “brief”.
justification_max_sents – Maximum sentences in justification. Defaults to 2.
add_references – Whether to include source references for extracted items.
reference_depth – Source reference granularity (“paragraphs” or “sentences”). Defaults to “paragraphs”. Only relevant when references are added to extracted items. Affects the structure of
extracted_items
.singular_occurrence – Whether this concept is restricted to having only one extracted item. If True, only a single extracted item will be extracted. Defaults to False (multiple extracted items are allowed). Note that with advanced LLMs, this constraint may not be strictly required as they can often infer the appropriate cardinality from the concept’s name, description, and type (e.g., “document title” vs “key findings”).
- Parameters:
custom_data (dict)
add_justifications (bool)
justification_depth (JustificationDepth)
justification_max_sents (int)
name (NonEmptyStr)
description (NonEmptyStr)
llm_role (LLMRoleAny)
add_references (bool)
reference_depth (ReferenceDepth)
singular_occurrence (StrictBool)
examples (list[StringExample])
- Example:
- String concept definition#
from contextgem import StringConcept, StringExample # Define a string concept for identifying contract party names # and their roles in the contract party_names_and_roles_concept = StringConcept( name="Party names and roles", description=( "Names of all parties entering into the agreement " "and their contractual roles" ), examples=[ StringExample( content="X (Client)", # guidance regarding format ) ], )
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- examples: list[StringExample]#
- clone()#
Creates and returns a deep copy of the current instance.
- Return type:
typing.Self
- Returns:
A deep copy of the current instance.
- property extracted_items: list[_ExtractedItem]#
Provides access to extracted items.
- Returns:
A list containing the extracted items as _ExtractedItem objects.
- Return type:
list[_ExtractedItem]
- classmethod from_dict(obj_dict)#
Reconstructs an instance of the class from a dictionary representation.
This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.
- classmethod from_disk(file_path)#
Loads an instance of the class from a JSON file stored on disk.
This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.
- Parameters:
file_path (str) – Path to the JSON file to load (must end with ‘.json’).
- Returns:
An instance of the class populated with the data from the file.
- Return type:
Self
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
OSError – If there’s an error reading the file.
RuntimeError – If deserialization fails.
- classmethod from_json(json_string)#
Creates an instance of the class from a JSON string representation.
This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.
- to_dict()#
Transforms the current object into a dictionary representation.
Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes
When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.
- to_disk(file_path)#
Saves the serialized instance to a JSON file at the specified path.
This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.
- Parameters:
file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).
- Return type:
- Returns:
None
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
IOError – If there’s an error during the file writing process.
- to_json()#
Converts the object to its JSON string representation.
Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.
- Returns:
A JSON string representation of the object.
- Return type:
- name: NonEmptyStr#
- description: NonEmptyStr#
- llm_role: LLMRoleAny#
- add_references: StrictBool#
- reference_depth: ReferenceDepth#
- singular_occurrence: StrictBool#
- add_justifications: StrictBool#
- justification_depth: JustificationDepth#
- justification_max_sents: StrictInt#
- custom_data: dict#
- class contextgem.public.concepts.BooleanConcept(**data)[source]#
Bases:
_Concept
A concept model for boolean (True/False) information extraction from documents and aspects.
This class handles identification and extraction of boolean values that represent conceptual properties or attributes within content.
- Variables:
name – The name of the concept (non-empty string, stripped).
description – A brief description of the concept (non-empty string, stripped).
llm_role – The role of the LLM responsible for extracting the concept (“extractor_text”, “reasoner_text”, “extractor_vision”, “reasoner_vision”). Defaults to “extractor_text”.
add_justifications – Whether to include justifications for extracted items.
justification_depth – Justification detail level. Defaults to “brief”.
justification_max_sents – Maximum sentences in justification. Defaults to 2.
add_references – Whether to include source references for extracted items.
reference_depth – Source reference granularity (“paragraphs” or “sentences”). Defaults to “paragraphs”. Only relevant when references are added to extracted items. Affects the structure of
extracted_items
.singular_occurrence – Whether this concept is restricted to having only one extracted item. If True, only a single extracted item will be extracted. Defaults to False (multiple extracted items are allowed). Note that with advanced LLMs, this constraint may not be strictly required as they can often infer the appropriate cardinality from the concept’s name, description, and type (e.g., “document title” vs “key findings”).
- Parameters:
- Example:
- Boolean concept definition#
from contextgem import BooleanConcept # Create the concept with specific configuration has_confidentiality = BooleanConcept( name="Contains confidentiality clause", description="Determines whether the contract includes provisions requiring parties to maintain confidentiality", llm_role="reasoner_text", singular_occurrence=True, add_justifications=True, justification_depth="brief", )
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- clone()#
Creates and returns a deep copy of the current instance.
- Return type:
typing.Self
- Returns:
A deep copy of the current instance.
- property extracted_items: list[_ExtractedItem]#
Provides access to extracted items.
- Returns:
A list containing the extracted items as _ExtractedItem objects.
- Return type:
list[_ExtractedItem]
- classmethod from_dict(obj_dict)#
Reconstructs an instance of the class from a dictionary representation.
This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.
- classmethod from_disk(file_path)#
Loads an instance of the class from a JSON file stored on disk.
This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.
- Parameters:
file_path (str) – Path to the JSON file to load (must end with ‘.json’).
- Returns:
An instance of the class populated with the data from the file.
- Return type:
Self
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
OSError – If there’s an error reading the file.
RuntimeError – If deserialization fails.
- classmethod from_json(json_string)#
Creates an instance of the class from a JSON string representation.
This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.
- to_dict()#
Transforms the current object into a dictionary representation.
Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes
When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.
- to_disk(file_path)#
Saves the serialized instance to a JSON file at the specified path.
This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.
- Parameters:
file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).
- Return type:
- Returns:
None
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
IOError – If there’s an error during the file writing process.
- to_json()#
Converts the object to its JSON string representation.
Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.
- Returns:
A JSON string representation of the object.
- Return type:
- name: NonEmptyStr#
- description: NonEmptyStr#
- llm_role: LLMRoleAny#
- add_references: StrictBool#
- reference_depth: ReferenceDepth#
- singular_occurrence: StrictBool#
- add_justifications: StrictBool#
- justification_depth: JustificationDepth#
- justification_max_sents: StrictInt#
- custom_data: dict#
- class contextgem.public.concepts.NumericalConcept(**data)[source]#
Bases:
_Concept
A concept model for numerical information extraction from documents and aspects.
This class handles identification and extraction of numeric values (integers, floats, or both) that represent conceptual measurements or quantities within content.
- Variables:
name – The name of the concept (non-empty string, stripped).
description – A brief description of the concept (non-empty string, stripped).
numeric_type – Type constraint for extracted numbers (“int”, “float”, or “any”). Defaults to “any” for auto-detection.
llm_role – The role of the LLM responsible for extracting the concept (“extractor_text”, “reasoner_text”, “extractor_vision”, “reasoner_vision”). Defaults to “extractor_text”.
add_justifications – Whether to include justifications for extracted items.
justification_depth – Justification detail level. Defaults to “brief”.
justification_max_sents – Maximum sentences in justification. Defaults to 2.
add_references – Whether to include source references for extracted items.
reference_depth – Source reference granularity (“paragraphs” or “sentences”). Defaults to “paragraphs”. Only relevant when references are added to extracted items. Affects the structure of
extracted_items
.singular_occurrence – Whether this concept is restricted to having only one extracted item. If True, only a single extracted item will be extracted. Defaults to False (multiple extracted items are allowed). Note that with advanced LLMs, this constraint may not be strictly required as they can often infer the appropriate cardinality from the concept’s name, description, and type (e.g., “document title” vs “key findings”).
- Parameters:
custom_data (dict)
add_justifications (bool)
justification_depth (JustificationDepth)
justification_max_sents (int)
name (NonEmptyStr)
description (NonEmptyStr)
llm_role (LLMRoleAny)
add_references (bool)
reference_depth (ReferenceDepth)
singular_occurrence (StrictBool)
numeric_type (Literal["int", "float", "any"])
- Example:
- Numerical concept definition#
from contextgem import NumericalConcept # Create concepts for different numerical values in the contract payment_amount = NumericalConcept( name="Payment amount", description="The monetary value to be paid according to the contract terms", numeric_type="float", llm_role="extractor_text", add_references=True, reference_depth="sentences", ) payment_days = NumericalConcept( name="Payment term days", description="The number of days within which payment must be made", numeric_type="int", llm_role="extractor_text", add_justifications=True, justification_depth="balanced", )
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- numeric_type: Literal['int', 'float', 'any']#
- clone()#
Creates and returns a deep copy of the current instance.
- Return type:
typing.Self
- Returns:
A deep copy of the current instance.
- property extracted_items: list[_ExtractedItem]#
Provides access to extracted items.
- Returns:
A list containing the extracted items as _ExtractedItem objects.
- Return type:
list[_ExtractedItem]
- classmethod from_dict(obj_dict)#
Reconstructs an instance of the class from a dictionary representation.
This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.
- classmethod from_disk(file_path)#
Loads an instance of the class from a JSON file stored on disk.
This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.
- Parameters:
file_path (str) – Path to the JSON file to load (must end with ‘.json’).
- Returns:
An instance of the class populated with the data from the file.
- Return type:
Self
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
OSError – If there’s an error reading the file.
RuntimeError – If deserialization fails.
- classmethod from_json(json_string)#
Creates an instance of the class from a JSON string representation.
This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.
- to_dict()#
Transforms the current object into a dictionary representation.
Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes
When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.
- to_disk(file_path)#
Saves the serialized instance to a JSON file at the specified path.
This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.
- Parameters:
file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).
- Return type:
- Returns:
None
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
IOError – If there’s an error during the file writing process.
- to_json()#
Converts the object to its JSON string representation.
Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.
- Returns:
A JSON string representation of the object.
- Return type:
- name: NonEmptyStr#
- description: NonEmptyStr#
- llm_role: LLMRoleAny#
- add_references: StrictBool#
- reference_depth: ReferenceDepth#
- singular_occurrence: StrictBool#
- add_justifications: StrictBool#
- justification_depth: JustificationDepth#
- justification_max_sents: StrictInt#
- custom_data: dict#
- class contextgem.public.concepts.RatingConcept(**data)[source]#
Bases:
_Concept
A concept model for rating-based information extraction with defined scale boundaries.
This class handles identification and extraction of integer ratings that must fall within the boundaries of a specified rating scale.
- Variables:
name – The name of the concept (non-empty string, stripped).
description – A brief description of the concept (non-empty string, stripped).
rating_scale – The rating scale defining valid value boundaries.
llm_role – The role of the LLM responsible for extracting the concept (“extractor_text”, “reasoner_text”, “extractor_vision”, “reasoner_vision”). Defaults to “extractor_text”.
add_justifications – Whether to include justifications for extracted items.
justification_depth – Justification detail level. Defaults to “brief”.
justification_max_sents – Maximum sentences in justification. Defaults to 2.
add_references – Whether to include source references for extracted items.
reference_depth – Source reference granularity (“paragraphs” or “sentences”). Defaults to “paragraphs”. Only relevant when references are added to extracted items. Affects the structure of
extracted_items
.singular_occurrence – Whether this concept is restricted to having only one extracted item. If True, only a single extracted item will be extracted. Defaults to False (multiple extracted items are allowed). Note that with advanced LLMs, this constraint may not be strictly required as they can often infer the appropriate cardinality from the concept’s name, description, and type (e.g., “document title” vs “key findings”).
- Parameters:
custom_data (dict)
add_justifications (bool)
justification_depth (JustificationDepth)
justification_max_sents (int)
name (NonEmptyStr)
description (NonEmptyStr)
llm_role (LLMRoleAny)
add_references (bool)
reference_depth (ReferenceDepth)
singular_occurrence (StrictBool)
rating_scale (RatingScale)
- Example:
- Rating concept definition#
from contextgem import RatingConcept, RatingScale # Create a rating scale for contract fairness evaluation fairness_scale = RatingScale(start=1, end=5) # Create a concept to rate the fairness of contract terms fairness_rating = RatingConcept( name="Contract fairness rating", description="Evaluation of how balanced and fair the contract terms are for all parties", rating_scale=fairness_scale, llm_role="reasoner_text", add_justifications=True, justification_depth="comprehensive", justification_max_sents=10, ) # Create a clarity scale for contract language evaluation clarity_scale = RatingScale(start=1, end=10) # Create a concept to rate the clarity of contract language clarity_rating = RatingConcept( name="Language clarity rating", description="Assessment of how clear and unambiguous the contract language is", rating_scale=clarity_scale, llm_role="reasoner_text", add_justifications=True, justification_depth="balanced", justification_max_sents=3, )
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- rating_scale: RatingScale#
- property extracted_items: list[_IntegerItem]#
Provides access to extracted items.
- Returns:
A list containing the extracted items as _ExtractedItem objects.
- Return type:
list[_ExtractedItem]
- clone()#
Creates and returns a deep copy of the current instance.
- Return type:
typing.Self
- Returns:
A deep copy of the current instance.
- classmethod from_dict(obj_dict)#
Reconstructs an instance of the class from a dictionary representation.
This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.
- classmethod from_disk(file_path)#
Loads an instance of the class from a JSON file stored on disk.
This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.
- Parameters:
file_path (str) – Path to the JSON file to load (must end with ‘.json’).
- Returns:
An instance of the class populated with the data from the file.
- Return type:
Self
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
OSError – If there’s an error reading the file.
RuntimeError – If deserialization fails.
- classmethod from_json(json_string)#
Creates an instance of the class from a JSON string representation.
This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.
- to_dict()#
Transforms the current object into a dictionary representation.
Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes
When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.
- to_disk(file_path)#
Saves the serialized instance to a JSON file at the specified path.
This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.
- Parameters:
file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).
- Return type:
- Returns:
None
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
IOError – If there’s an error during the file writing process.
- to_json()#
Converts the object to its JSON string representation.
Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.
- Returns:
A JSON string representation of the object.
- Return type:
- name: NonEmptyStr#
- description: NonEmptyStr#
- llm_role: LLMRoleAny#
- add_references: StrictBool#
- reference_depth: ReferenceDepth#
- singular_occurrence: StrictBool#
- add_justifications: StrictBool#
- justification_depth: JustificationDepth#
- justification_max_sents: StrictInt#
- custom_data: dict#
- class contextgem.public.concepts.JsonObjectConcept(**data)[source]#
Bases:
_Concept
A concept model for structured JSON object extraction from documents and aspects.
This class handles identification and extraction of structured data in JSON format, with validation against a predefined schema structure.
- Variables:
name – The name of the concept (non-empty string, stripped).
description – A brief description of the concept (non-empty string, stripped).
structure – JSON object schema as a class with type annotations or dictionary where keys are field names and values are type annotations. Supports generic aliases and union types. All annotated types must be JSON-serializable. Example:
{"item": str, "amount": int | float}
. Tip: do not overcomplicate the structure to avoid prompt overloading. If you need to enforce a nested structure (e.g. an object within an object), use type hints together with examples that will guide the output format. E.g. structure{"item": dict[str, str]}
and example{"item": {"name": "item1", "description": "description1"}}
.examples – Example JSON objects illustrating the concept usage.
llm_role – The role of the LLM responsible for extracting the concept (“extractor_text”, “reasoner_text”, “extractor_vision”, “reasoner_vision”). Defaults to “extractor_text”.
add_justifications – Whether to include justifications for extracted items.
justification_depth – Justification detail level. Defaults to “brief”.
justification_max_sents – Maximum sentences in justification. Defaults to 2.
add_references – Whether to include source references for extracted items.
reference_depth – Source reference granularity (“paragraphs” or “sentences”). Defaults to “paragraphs”. Only relevant when references are added to extracted items. Affects the structure of
extracted_items
.singular_occurrence – Whether this concept is restricted to having only one extracted item. If True, only a single extracted item will be extracted. Defaults to False (multiple extracted items are allowed). Note that with advanced LLMs, this constraint may not be strictly required as they can often infer the appropriate cardinality from the concept’s name, description, and type (e.g., “document title” vs “key findings”).
- Parameters:
custom_data (dict)
add_justifications (bool)
justification_depth (JustificationDepth)
justification_max_sents (int)
name (NonEmptyStr)
description (NonEmptyStr)
llm_role (LLMRoleAny)
add_references (bool)
reference_depth (ReferenceDepth)
singular_occurrence (StrictBool)
examples (list[JsonObjectExample])
- Example:
- JSON object concept definition#
from contextgem import JsonObjectConcept # Define a JSON object concept for capturing address information address_info_concept = JsonObjectConcept( name="Address information", description=( "Structured address data from text including street, " "city, state, postal code, and country." ), structure={ "street": str | None, "city": str | None, "state": str | None, "postal_code": str | None, "country": str | None, }, )
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- structure: type | dict[NonEmptyStr, Any]#
- examples: list[JsonObjectExample]#
- clone()#
Creates and returns a deep copy of the current instance.
- Return type:
typing.Self
- Returns:
A deep copy of the current instance.
- property extracted_items: list[_ExtractedItem]#
Provides access to extracted items.
- Returns:
A list containing the extracted items as _ExtractedItem objects.
- Return type:
list[_ExtractedItem]
- classmethod from_dict(obj_dict)#
Reconstructs an instance of the class from a dictionary representation.
This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.
- classmethod from_disk(file_path)#
Loads an instance of the class from a JSON file stored on disk.
This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.
- Parameters:
file_path (str) – Path to the JSON file to load (must end with ‘.json’).
- Returns:
An instance of the class populated with the data from the file.
- Return type:
Self
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
OSError – If there’s an error reading the file.
RuntimeError – If deserialization fails.
- classmethod from_json(json_string)#
Creates an instance of the class from a JSON string representation.
This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.
- to_dict()#
Transforms the current object into a dictionary representation.
Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes
When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.
- to_disk(file_path)#
Saves the serialized instance to a JSON file at the specified path.
This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.
- Parameters:
file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).
- Return type:
- Returns:
None
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
IOError – If there’s an error during the file writing process.
- to_json()#
Converts the object to its JSON string representation.
Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.
- Returns:
A JSON string representation of the object.
- Return type:
- name: NonEmptyStr#
- description: NonEmptyStr#
- llm_role: LLMRoleAny#
- add_references: StrictBool#
- reference_depth: ReferenceDepth#
- singular_occurrence: StrictBool#
- add_justifications: StrictBool#
- justification_depth: JustificationDepth#
- justification_max_sents: StrictInt#
- custom_data: dict#
- class contextgem.public.concepts.DateConcept(**data)[source]#
Bases:
_Concept
A concept model for date object extraction from documents and aspects.
This class handles identification and extraction of dates, with support for parsing string representations in a specified format into Python date objects.
- Variables:
name – The name of the concept (non-empty string, stripped).
description – A brief description of the concept (non-empty string, stripped).
llm_role – The role of the LLM responsible for extracting the concept (“extractor_text”, “reasoner_text”, “extractor_vision”, “reasoner_vision”). Defaults to “extractor_text”.
add_justifications – Whether to include justifications for extracted items.
justification_depth – Justification detail level. Defaults to “brief”.
justification_max_sents – Maximum sentences in justification. Defaults to 2.
add_references – Whether to include source references for extracted items.
reference_depth – Source reference granularity (“paragraphs” or “sentences”). Defaults to “paragraphs”. Only relevant when references are added to extracted items. Affects the structure of
extracted_items
.singular_occurrence – Whether this concept is restricted to having only one extracted item. If True, only a single extracted item will be extracted. Defaults to False (multiple extracted items are allowed). Note that with advanced LLMs, this constraint may not be strictly required as they can often infer the appropriate cardinality from the concept’s name, description, and type (e.g., “document title” vs “key findings”).
- Parameters:
- Example:
- Date concept definition#
from contextgem import DateConcept # Create a date concept to extract the effective date of the contract effective_date = DateConcept( name="Effective date", description="The effective as specified in the contract", add_references=True, # Include references to where dates were found singular_occurrence=True, # Only extract one effective date per document )
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- clone()#
Creates and returns a deep copy of the current instance.
- Return type:
typing.Self
- Returns:
A deep copy of the current instance.
- property extracted_items: list[_ExtractedItem]#
Provides access to extracted items.
- Returns:
A list containing the extracted items as _ExtractedItem objects.
- Return type:
list[_ExtractedItem]
- classmethod from_dict(obj_dict)#
Reconstructs an instance of the class from a dictionary representation.
This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.
- classmethod from_disk(file_path)#
Loads an instance of the class from a JSON file stored on disk.
This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.
- Parameters:
file_path (str) – Path to the JSON file to load (must end with ‘.json’).
- Returns:
An instance of the class populated with the data from the file.
- Return type:
Self
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
OSError – If there’s an error reading the file.
RuntimeError – If deserialization fails.
- classmethod from_json(json_string)#
Creates an instance of the class from a JSON string representation.
This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.
- to_dict()#
Transforms the current object into a dictionary representation.
Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes
When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.
- to_disk(file_path)#
Saves the serialized instance to a JSON file at the specified path.
This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.
- Parameters:
file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).
- Return type:
- Returns:
None
- Raises:
ValueError – If the file path doesn’t end with ‘.json’.
IOError – If there’s an error during the file writing process.
- to_json()#
Converts the object to its JSON string representation.
Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.
- Returns:
A JSON string representation of the object.
- Return type:
- name: NonEmptyStr#
- description: NonEmptyStr#
- llm_role: LLMRoleAny#
- add_references: StrictBool#
- reference_depth: ReferenceDepth#
- singular_occurrence: StrictBool#
- add_justifications: StrictBool#
- justification_depth: JustificationDepth#
- justification_max_sents: StrictInt#
- custom_data: dict#