DateConcept

DateConcept#

DateConcept is a specialized concept type that extracts, interprets, and processes date information from documents, returning standardized datetime.date objects.

📝 Overview#

DateConcept is used when you need to extract date information from documents, allowing you to:

Extract explicit dates: Identify dates that are directly mentioned in various formats (e.g., “January 15, 2025”, “15/01/2025”, “2025-01-15”)
Infer implicit dates: Deduce dates from contextual information (e.g., “next Monday”, “two weeks from signing”, “the following quarter”)
Calculate derived dates: Determine dates based on other temporal references (e.g., “30 days after delivery”, “the fiscal year ending”)
Normalize date representations: Convert various date formats into standardized Python datetime.date objects for consistent processing

This concept type is particularly valuable for extracting temporal information from documents such as:

Contract effective dates, expiration dates, and renewal periods
Report publication dates and data collection periods
Event scheduling information and deadline specifications
Historical dates and chronological sequences

💻 Usage Example#

Here’s a simple example of how to use DateConcept to extract a publication date from a document:

# ContextGem: DateConcept Extraction

import os

from contextgem import DateConcept, Document, DocumentLLM

# Create a Document object from text
doc = Document(
    raw_text="The research paper was published on March 15, 2025 and has been cited 42 times since."
)

# Define a DateConcept to extract the publication date
date_concept = DateConcept(
    name="Publication date",
    description="The date when the paper was published",
)

# Attach the concept to the document
doc.add_concepts([date_concept])

# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
    model="azure/gpt-4.1-mini",
    api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
    api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)

# Extract the concept from the document
date_concept = llm.extract_concepts_from_document(doc)[0]

# Print the extracted value
print(
    type(date_concept.extracted_items[0].value), date_concept.extracted_items[0].value
)
# Output: <class 'datetime.date'> 2025-03-15

# Or access the extracted value from the document object
print(
    type(doc.concepts[0].extracted_items[0].value),
    doc.concepts[0].extracted_items[0].value,
)
# Output: <class 'datetime.date'> 2025-03-15

⚙️ Parameters#

When creating a DateConcept, you can specify the following parameters:

Parameter	Type	Description
`name`	str	A unique name identifier for the concept
`description`	str	A clear description of what date information to extract, which can include explicit dates to find, implicit dates to infer, or temporal relationships to identify. For date concepts, be specific about the exact date information sought (e.g., “the contract signing date” rather than just “dates in the document”) to ensure consistent and accurate extractions.
`llm_role`	str	The role of the LLM responsible for extracting the concept. Available values: `"extractor_text"`, `"reasoner_text"`, `"extractor_vision"`, `"reasoner_vision"`. Defaults to `"extractor_text"`. For more details, see 🏷️ LLM Roles.
`add_justifications`	bool	Whether to include justifications for extracted items (defaults to `False`). Justifications provide explanations of why specific dates were extracted, which is especially valuable when dates are inferred from contextual clues (e.g., “next quarter” or “30 days after signing”) or when resolving ambiguous date references in the document.
`justification_depth`	str	Justification detail level. Available values: `"brief"`, `"balanced"`, `"comprehensive"`. Defaults to `"brief"`
`justification_max_sents`	int	Maximum sentences in a justification (defaults to `2`)
`add_references`	bool	Whether to include source references for extracted items (defaults to `False`). References indicate the specific locations in the document where date information was found, derived, or inferred from. This is particularly useful for tracing dates back to their original context, understanding how relative dates were calculated (e.g., “30 days after delivery”), or verifying how the system resolved ambiguous temporal references (e.g., “next fiscal year”).
`reference_depth`	str	Source reference granularity. Available values: `"paragraphs"`, `"sentences"`. Defaults to `"paragraphs"`
`singular_occurrence`	bool	Whether this concept is restricted to having only one extracted item. If `True`, only a single date will be extracted. Defaults to `False` (multiple dates are allowed). For date concepts, this parameter is particularly useful when you want to extract a specific, unique date in the document (e.g., “publication date” or “contract signing date”) rather than identifying multiple dates throughout the document. Note that with advanced LLMs, this constraint may not be required as they can often infer the appropriate cardinality from the concept’s name, description, and type.
`custom_data`	dict	Optional. Dictionary for storing any additional data that you want to associate with the concept. This data must be JSON-serializable. This data is not used for extraction but can be useful for custom processing or downstream tasks. Defaults to an empty dictionary.

🚀 Advanced Usage#

🔍 References and Justifications for Extraction#

You can configure a DateConcept to include justifications and references. Justifications help explain the reasoning behind extracted dates, especially for complex or inferred temporal information (like dates derived from expressions such as “30 days after delivery” or “next fiscal year”), while references point to the specific parts of the document that contained the date information or based on which date information was inferred:

# ContextGem: DateConcept Extraction with References and Justifications

import os

from contextgem import DateConcept, Document, DocumentLLM

# Sample document text containing project timeline information
project_text = """
Project Timeline: Website Redesign

The website redesign project officially kicked off on March 1, 2024.
The development team has estimated the project will take 4 months to complete.

Key milestones:
- Design phase: 1 month
- Development phase: 2 months  
- Testing and deployment: 1 month

The marketing team needs the final completion date to plan the launch campaign.
"""

# Create a Document from the text
doc = Document(raw_text=project_text)

# Create a DateConcept to calculate the project completion date
completion_date_concept = DateConcept(
    name="Project completion date",
    description="The final completion date for the website redesign project",
    add_justifications=True,  # enable justifications to understand extraction reasoning
    justification_depth="balanced",
    justification_max_sents=3,  # allow up to 3 sentences for the calculation justification
    add_references=True,  # include references to source text
    reference_depth="sentences",  # reference specific sentences rather than paragraphs
    singular_occurrence=True,  # extract only one calculated date
)

# Attach the concept to the document
doc.add_concepts([completion_date_concept])

# Configure DocumentLLM
llm = DocumentLLM(
    model="azure/o4-mini",
    api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
    api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
    reasoning_effort="medium",
)

# Extract the concept
completion_date_concept = llm.extract_concepts_from_document(doc)[0]

# Print the calculated completion date with justification and references
print("Calculated project completion date:")
extracted_item = completion_date_concept.extracted_items[
    0
]  # get the single calculated date
print(f"\nCompletion Date: {extracted_item.value}")  # expected output: 2024-07-01
print(f"Calculation Justification: {extracted_item.justification}")
print("Source references used for calculation:")
for sent in extracted_item.reference_sentences:
    print(f"- {sent.raw_text}")

📊 Extracted Items#

When a DateConcept is extracted, it is populated with a list of extracted items accessible through the .extracted_items property. Each item is an instance of the _DateItem class with the following attributes:

Attribute	Type	Description
`value`	datetime.date	The extracted date as a Python `datetime.date` object
`justification`	str	Explanation of why this date was extracted (only if `add_justifications=True`)
`reference_paragraphs`	list[`Paragraph`]	List of paragraph objects where the date was found or from which it was calculated, derived, or inferred (only if `add_references=True`)
`reference_sentences`	list[`Sentence`]	List of sentence objects where the date was found or from which it was calculated, derived, or inferred (only if `add_references=True` and `reference_depth="sentences"`)

💡 Best Practices#

Here are some best practices to optimize your use of DateConcept:

Provide a clear and specific description that helps the LLM understand exactly what date to extract, using precise and unambiguous language (e.g., “contract signing date” rather than just “date”).
For dates that require interpretation or calculation (like “30 days after delivery” or “end of next fiscal year”), include these requirements explicitly in your description to ensure the LLM performs the necessary temporal reasoning.
Break down complex date extractions into multiple simpler date concepts when appropriate. Instead of one concept extracting “all contract dates,” consider separate concepts for “contract signing date,” “effective date,” and “termination date.”
Enable justifications (using add_justifications=True) when you need to understand the reasoning behind date calculations or extractions, especially for relative or inferred dates.
Enable references (using add_references=True) when you need to trace back to specific parts of the document that contained the original date information or where dates were calculated from (e.g., deriving a project completion date from a start date plus duration information).
Use singular_occurrence=True to enforce only a single date extraction. This is particularly useful for concepts that should yield a unique calculated date, such as “project completion deadline” where multiple timeline elements need to be synthesized into a single target date, or when multiple date mentions actually refer to the same event.
Leverage the returned Python datetime.date objects for direct integration with date-based calculations, comparisons, or formatting in your application logic.