DateConcept#

DateConcept is a specialized concept type that extracts, interprets, and processes date information from documents, returning standardized datetime.date objects.

📝 Overview#

DateConcept is used when you need to extract date information from documents, allowing you to:

  • Extract explicit dates: Identify dates that are directly mentioned in various formats (e.g., “January 15, 2025”, “15/01/2025”, “2025-01-15”)

  • Infer implicit dates: Deduce dates from contextual information (e.g., “next Monday”, “two weeks from signing”, “the following quarter”)

  • Calculate derived dates: Determine dates based on other temporal references (e.g., “30 days after delivery”, “the fiscal year ending”)

  • Normalize date representations: Convert various date formats into standardized Python datetime.date objects for consistent processing

This concept type is particularly valuable for extracting temporal information from documents such as:

  • Contract effective dates, expiration dates, and renewal periods

  • Report publication dates and data collection periods

  • Event scheduling information and deadline specifications

  • Historical dates and chronological sequences

đź’» Usage Example#

Here’s a simple example of how to use DateConcept to extract a publication date from a document:

# ContextGem: DateConcept Extraction

import os

from contextgem import DateConcept, Document, DocumentLLM

# Create a Document object from text
doc = Document(
    raw_text="The research paper was published on March 15, 2025 and has been cited 42 times since."
)

# Define a DateConcept to extract the publication date
date_concept = DateConcept(
    name="Publication date",
    description="The date when the paper was published",
)

# Attach the concept to the document
doc.add_concepts([date_concept])

# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
    model="azure/gpt-4.1-mini",
    api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
    api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)

# Extract the concept from the document
date_concept = llm.extract_concepts_from_document(doc)[0]

# Print the extracted value
print(
    type(date_concept.extracted_items[0].value), date_concept.extracted_items[0].value
)
# Output: <class 'datetime.date'> 2025-03-15

# Or access the extracted value from the document object
print(
    type(doc.concepts[0].extracted_items[0].value),
    doc.concepts[0].extracted_items[0].value,
)
# Output: <class 'datetime.date'> 2025-03-15
Open In Colab

⚙️ Parameters#

When creating a DateConcept, you can specify the following parameters:

Parameter

Type

Description

name

str

A unique name identifier for the concept

description

str

A clear description of what date information to extract, which can include explicit dates to find, implicit dates to infer, or temporal relationships to identify. For date concepts, be specific about the exact date information sought (e.g., “the contract signing date” rather than just “dates in the document”) to ensure consistent and accurate extractions.

llm_role

str

The role of the LLM responsible for extracting the concept. Available values: "extractor_text", "reasoner_text", "extractor_vision", "reasoner_vision". Defaults to "extractor_text". For more details, see 🏷️ LLM Roles.

add_justifications

bool

Whether to include justifications for extracted items (defaults to False). Justifications provide explanations of why specific dates were extracted, which is especially valuable when dates are inferred from contextual clues (e.g., “next quarter” or “30 days after signing”) or when resolving ambiguous date references in the document.

justification_depth

str

Justification detail level. Available values: "brief", "balanced", "comprehensive". Defaults to "brief"

justification_max_sents

int

Maximum sentences in a justification (defaults to 2)

add_references

bool

Whether to include source references for extracted items (defaults to False). References indicate the specific locations in the document where date information was found, derived, or inferred from. This is particularly useful for tracing dates back to their original context, understanding how relative dates were calculated (e.g., “30 days after delivery”), or verifying how the system resolved ambiguous temporal references (e.g., “next fiscal year”).

reference_depth

str

Source reference granularity. Available values: "paragraphs", "sentences". Defaults to "paragraphs"

singular_occurrence

bool

Whether this concept is restricted to having only one extracted item. If True, only a single date will be extracted. Defaults to False (multiple dates are allowed). For date concepts, this parameter is particularly useful when you want to extract a specific, unique date in the document (e.g., “publication date” or “contract signing date”) rather than identifying multiple dates throughout the document. Note that with advanced LLMs, this constraint may not be required as they can often infer the appropriate cardinality from the concept’s name, description, and type.

custom_data

dict

Optional. Dictionary for storing any additional data that you want to associate with the concept. This data must be JSON-serializable. This data is not used for extraction but can be useful for custom processing or downstream tasks. Defaults to an empty dictionary.

🚀 Advanced Usage#

🔍 References and Justifications for Extraction#

You can configure a DateConcept to include justifications and references. Justifications help explain the reasoning behind extracted dates, especially for complex or inferred temporal information (like dates derived from expressions such as “30 days after delivery” or “next fiscal year”), while references point to the specific parts of the document that contained the date information or based on which date information was inferred:

# ContextGem: DateConcept Extraction with References and Justifications

import os

from contextgem import DateConcept, Document, DocumentLLM

# Sample document text containing project timeline information
project_text = """
Project Timeline: Website Redesign

The website redesign project officially kicked off on March 1, 2024.
The development team has estimated the project will take 4 months to complete.

Key milestones:
- Design phase: 1 month
- Development phase: 2 months  
- Testing and deployment: 1 month

The marketing team needs the final completion date to plan the launch campaign.
"""

# Create a Document from the text
doc = Document(raw_text=project_text)

# Create a DateConcept to calculate the project completion date
completion_date_concept = DateConcept(
    name="Project completion date",
    description="The final completion date for the website redesign project",
    add_justifications=True,  # enable justifications to understand extraction reasoning
    justification_depth="balanced",
    justification_max_sents=3,  # allow up to 3 sentences for the calculation justification
    add_references=True,  # include references to source text
    reference_depth="sentences",  # reference specific sentences rather than paragraphs
    singular_occurrence=True,  # extract only one calculated date
)

# Attach the concept to the document
doc.add_concepts([completion_date_concept])

# Configure DocumentLLM
llm = DocumentLLM(
    model="azure/o4-mini",
    api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
    api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
    reasoning_effort="medium",
)

# Extract the concept
completion_date_concept = llm.extract_concepts_from_document(doc)[0]

# Print the calculated completion date with justification and references
print("Calculated project completion date:")
extracted_item = completion_date_concept.extracted_items[
    0
]  # get the single calculated date
print(f"\nCompletion Date: {extracted_item.value}")  # expected output: 2024-07-01
print(f"Calculation Justification: {extracted_item.justification}")
print("Source references used for calculation:")
for sent in extracted_item.reference_sentences:
    print(f"- {sent.raw_text}")
Open In Colab

📊 Extracted Items#

When a DateConcept is extracted, it is populated with a list of extracted items accessible through the .extracted_items property. Each item is an instance of the _DateItem class with the following attributes:

Attribute

Type

Description

value

datetime.date

The extracted date as a Python datetime.date object

justification

str

Explanation of why this date was extracted (only if add_justifications=True)

reference_paragraphs

list[Paragraph]

List of paragraph objects where the date was found or from which it was calculated, derived, or inferred (only if add_references=True)

reference_sentences

list[Sentence]

List of sentence objects where the date was found or from which it was calculated, derived, or inferred (only if add_references=True and reference_depth="sentences")

đź’ˇ Best Practices#

Here are some best practices to optimize your use of DateConcept:

  • Provide a clear and specific description that helps the LLM understand exactly what date to extract, using precise and unambiguous language (e.g., “contract signing date” rather than just “date”).

  • For dates that require interpretation or calculation (like “30 days after delivery” or “end of next fiscal year”), include these requirements explicitly in your description to ensure the LLM performs the necessary temporal reasoning.

  • Break down complex date extractions into multiple simpler date concepts when appropriate. Instead of one concept extracting “all contract dates,” consider separate concepts for “contract signing date,” “effective date,” and “termination date.”

  • Enable justifications (using add_justifications=True) when you need to understand the reasoning behind date calculations or extractions, especially for relative or inferred dates.

  • Enable references (using add_references=True) when you need to trace back to specific parts of the document that contained the original date information or where dates were calculated from (e.g., deriving a project completion date from a start date plus duration information).

  • Use singular_occurrence=True to enforce only a single date extraction. This is particularly useful for concepts that should yield a unique calculated date, such as “project completion deadline” where multiple timeline elements need to be synthesized into a single target date, or when multiple date mentions actually refer to the same event.

  • Leverage the returned Python datetime.date objects for direct integration with date-based calculations, comparisons, or formatting in your application logic.