Examples#

Module for handling example data in document processing.

This module provides classes for defining examples that can be used to guide LLM extraction tasks. Examples serve as reference points for the model to understand the expected format and content of extracted information. The module supports different types of examples including string-based examples and structured JSON object examples.

Examples can be attached to concepts to provide concrete illustrations of the kind of information to be extracted, improving the accuracy and consistency of LLM-based extraction processes.

class contextgem.public.examples.StringExample(**data)[source]#

Bases: _Example

Represents a string example that can be provided by users for certain extraction tasks.

Variables:

content – A non-empty string that holds the text content of the example.

Parameters:
  • custom_data (dict)

  • content (NonEmptyStr)

Note:

Examples are optional and can be used to guide LLM extraction tasks. They serve as reference points for the model to understand the expected format and content of extracted information. StringExample can be attached to a StringConcept.

Example:
String example definition#
from contextgem import StringConcept, StringExample

# Create string examples
string_examples = [
    StringExample(content="X (Client)"),
    StringExample(content="Y (Supplier)"),
]

# Attach string examples to a StringConcept
string_concept = StringConcept(
    name="Contract party name and role",
    description="The name and role of the contract party",
    examples=string_examples,  # Attach the example to the concept (optional)
)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

content: NonEmptyStr#
clone()#

Creates and returns a deep copy of the current instance.

Return type:

typing.Self

Returns:

A deep copy of the current instance.

classmethod from_dict(obj_dict)#

Reconstructs an instance of the class from a dictionary representation.

This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.

Parameters:

obj_dict (dict[str, Any]) – Dictionary containing the serialized object data.

Returns:

A new instance of the class with restored attributes.

Return type:

Self

classmethod from_disk(file_path)#

Loads an instance of the class from a JSON file stored on disk.

This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.

Parameters:

file_path (str) – Path to the JSON file to load (must end with ‘.json’).

Returns:

An instance of the class populated with the data from the file.

Return type:

Self

Raises:
  • ValueError – If the file path doesn’t end with ‘.json’.

  • OSError – If there’s an error reading the file.

  • RuntimeError – If deserialization fails.

classmethod from_json(json_string)#

Creates an instance of the class from a JSON string representation.

This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.

Parameters:

json_string (str) – JSON string containing the serialized object data.

Returns:

A new instance of the class with restored state.

Return type:

Self

Raises:

TypeError – If the class name in the serialized data doesn’t match.

to_dict()#

Transforms the current object into a dictionary representation.

Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes

When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.

Returns:

A dictionary representation of the current object with all necessary data for serialization

Return type:

dict[str, Any]

to_disk(file_path)#

Saves the serialized instance to a JSON file at the specified path.

This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.

Parameters:

file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).

Return type:

None

Returns:

None

Raises:
  • ValueError – If the file path doesn’t end with ‘.json’.

  • IOError – If there’s an error during the file writing process.

to_json()#

Converts the object to its JSON string representation.

Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.

Returns:

A JSON string representation of the object.

Return type:

str

property unique_id: str#

Returns the ULID of the instance.

custom_data: dict#
class contextgem.public.examples.JsonObjectExample(**data)[source]#

Bases: _Example

Represents a JSON object example that can be provided by users for certain extraction tasks.

Variables:

content – A JSON-serializable dict with the minimum length of 1 that holds the content of the example.

Parameters:
Note:

Examples are optional and can be used to guide LLM extraction tasks. They serve as reference points for the model to understand the expected format and content of extracted information. JsonObjectExample can be attached to a JsonObjectConcept.

Example:
JSON object example definition#
from contextgem import JsonObjectConcept, JsonObjectExample

# Create a JSON object example
json_example = JsonObjectExample(
    content={
        "name": "John Doe",
        "education": "Bachelor's degree in Computer Science",
        "skills": ["Python", "Machine Learning", "Data Analysis"],
        "hobbies": ["Reading", "Traveling", "Gaming"],
    }
)


# Define a structure for JSON object concept
class PersonInfo:
    name: str
    education: str
    skills: list[str]
    hobbies: list[str]


# Also works as a dict with type hints, e.g.
# PersonInfo = {
#     "name": str,
#     "education": str,
#     "skills": list[str],
#     "hobbies": list[str],
# }

# Attach JSON example to a JsonObjectConcept
json_concept = JsonObjectConcept(
    name="Candidate info",
    description="Structured information about a job candidate",
    structure=PersonInfo,  # Define the expected structure
    examples=[json_example],  # Attach the example to the concept (optional)
)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

content: dict[str, Any]#
clone()#

Creates and returns a deep copy of the current instance.

Return type:

typing.Self

Returns:

A deep copy of the current instance.

classmethod from_dict(obj_dict)#

Reconstructs an instance of the class from a dictionary representation.

This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.

Parameters:

obj_dict (dict[str, Any]) – Dictionary containing the serialized object data.

Returns:

A new instance of the class with restored attributes.

Return type:

Self

classmethod from_disk(file_path)#

Loads an instance of the class from a JSON file stored on disk.

This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.

Parameters:

file_path (str) – Path to the JSON file to load (must end with ‘.json’).

Returns:

An instance of the class populated with the data from the file.

Return type:

Self

Raises:
  • ValueError – If the file path doesn’t end with ‘.json’.

  • OSError – If there’s an error reading the file.

  • RuntimeError – If deserialization fails.

classmethod from_json(json_string)#

Creates an instance of the class from a JSON string representation.

This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.

Parameters:

json_string (str) – JSON string containing the serialized object data.

Returns:

A new instance of the class with restored state.

Return type:

Self

Raises:

TypeError – If the class name in the serialized data doesn’t match.

to_dict()#

Transforms the current object into a dictionary representation.

Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes

When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.

Returns:

A dictionary representation of the current object with all necessary data for serialization

Return type:

dict[str, Any]

to_disk(file_path)#

Saves the serialized instance to a JSON file at the specified path.

This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.

Parameters:

file_path (str) – Path where the JSON file should be saved (must end with ‘.json’).

Return type:

None

Returns:

None

Raises:
  • ValueError – If the file path doesn’t end with ‘.json’.

  • IOError – If there’s an error during the file writing process.

to_json()#

Converts the object to its JSON string representation.

Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.

Returns:

A JSON string representation of the object.

Return type:

str

property unique_id: str#

Returns the ULID of the instance.

custom_data: dict#