NumericalConcept#
NumericalConcept
is a specialized concept type that extracts, calculates, or derives numerical values (integers, floats, or both) from document content.
📝 Overview#
NumericalConcept
enables powerful numerical data extraction and analysis from documents, such as:
Direct extraction: retrieving explicitly stated values like prices, percentages, dates, or measurements
Calculated values: computing sums, averages, growth rates, or other derived metrics
Quantitative assessments: determining counts, frequencies, totals, or numerical scores
The concept can work with integers, floating-point numbers, or both types based on your configuration.
đź’» Usage Example#
Here’s a simple example of how to use NumericalConcept
to extract a price from a document:
# ContextGem: NumericalConcept Extraction
import os
from contextgem import Document, DocumentLLM, NumericalConcept
# Create a Document object from text
doc = Document(
raw_text="The latest smartphone model costs $899.99 and will be available next week."
)
# Define a NumericalConcept to extract the price
price_concept = NumericalConcept(
name="Product price",
description="The price of the product",
numeric_type="float", # We expect a decimal price
)
# Attach the concept to the document
doc.add_concepts([price_concept])
# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
model="azure/gpt-4.1-mini",
api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)
# Extract the concept from the document
price_concept = llm.extract_concepts_from_document(doc)[0]
# Print the extracted value
print(price_concept.extracted_items[0].value) # Output: 899.99
# Or access the extracted value from the document object
print(doc.concepts[0].extracted_items[0].value) # Output: 899.99
⚙️ Parameters#
When creating a NumericalConcept
, you can specify the following parameters:
Parameter |
Type |
Description |
---|---|---|
|
str |
A unique name identifier for the concept |
|
str |
A clear description of what numerical value to extract, which can include explicit values to find, calculations to perform, or quantitative assessments to derive from the document content |
|
str |
The type of numerical values to extract. Available values: |
|
str |
The role of the LLM responsible for extracting the concept. Available values: |
|
bool |
Whether to include justifications for extracted items (defaults to |
|
str |
Justification detail level. Available values: |
|
int |
Maximum sentences in a justification (defaults to |
|
bool |
Whether to include source references for extracted items (defaults to |
|
str |
Source reference granularity. Available values: |
|
bool |
Whether this concept is restricted to having only one extracted item. If |
|
dict |
Optional. Dictionary for storing any additional data that you want to associate with the concept. This data must be JSON-serializable. This data is not used for extraction but can be useful for custom processing or downstream tasks. Defaults to an empty dictionary. |
🚀 Advanced Usage#
🔍 References and Justifications for Extraction#
You can configure a NumericalConcept
to include justifications and references. Justifications help explain the reasoning behind the extracted values, while references point to the specific parts of the document where the numerical values were either directly found or from which they were calculated or inferred, helping to trace back extracted values to their source content even when the extraction involves complex calculations or mathematical reasoning:
# ContextGem: NumericalConcept Extraction with References and Justifications
import os
from contextgem import Document, DocumentLLM, NumericalConcept
# Document with values that require calculation/inference
report_text = """
Quarterly Sales Report - Q2 2023
Product A: Sold 450 units at $75 each
Product B: Sold 320 units at $125 each
Product C: Sold 180 units at $95 each
Marketing expenses: $28,500
Operating costs: $42,700
"""
# Create a Document from the text
doc = Document(raw_text=report_text)
# Create a NumericalConcept for total revenue
total_revenue_concept = NumericalConcept(
name="Total quarterly revenue",
description="The total revenue calculated by multiplying units sold by their price",
add_justifications=True,
justification_depth="comprehensive", # Detailed justification to show calculation steps
justification_max_sents=4, # Maximum number of sentences for justification
add_references=True,
reference_depth="paragraphs", # Reference specific paragraphs
singular_occurrence=True, # Ensure that the data is merged into a single item
)
# Attach the concept to the document
doc.add_concepts([total_revenue_concept])
# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
model="azure/o4-mini",
api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)
# Extract the concept
total_revenue_concept = llm.extract_concepts_from_document(doc)[0]
# Print the extracted inferred value with justification
print("Calculated total quarterly revenue:")
for item in total_revenue_concept.extracted_items:
print(f"\nTotal Revenue: {item.value}")
print(f"Calculation Justification: {item.justification}")
print("Source references:")
for para in item.reference_paragraphs:
print(f"- {para.raw_text}")
📊 Extracted Items#
When a NumericalConcept
is extracted, it is populated with a list of extracted items accessible through the .extracted_items
property. Each item is an instance of the _NumericalItem
class with the following attributes:
Attribute |
Type |
Description |
---|---|---|
|
int or float |
The extracted numerical value, either an integer or floating-point number depending on the |
|
str |
Explanation of why this numerical value was extracted (only if |
|
list[ |
List of paragraph objects where the numerical value was found or from which it was calculated or inferred (only if |
|
list[ |
List of sentence objects where the numerical value was found or from which it was calculated or inferred (only if |
đź’ˇ Best Practices#
Here are some best practices to optimize your use of NumericalConcept
:
Provide a clear and specific description that helps the LLM understand exactly what numerical values to extract, using precise and unambiguous language in your concept names and descriptions. For numerical concepts, be explicit about the exact values you’re seeking (e.g., “the total contract value in USD” rather than just “contract value”). Avoid vague terms that could lead to incorrect extractions—for example, use “quarterly revenue figures in millions” instead of “revenue numbers” to ensure consistent and accurate extractions.
Use the appropriate
numeric_type
based on what you expect to extract or calculate:Use
"int"
for counts, quantities, or whole numbersUse
"float"
for prices, measurements, or values that may have decimal pointsUse
"any"
when you’re not sure or need to extract both types
Break down complex numerical extractions into multiple simpler numerical concepts when appropriate. Instead of one concept extracting “all financial metrics,” consider separate concepts for “revenue figures,” “expense amounts,” and “profit margins.” This provides more structured data and makes it easier to process the results for specific purposes.
Enable justifications (using
add_justifications=True
) when you need to understand the reasoning behind the LLM’s numerical extractions, especially when calculations or conversions are involved.Enable references (using
add_references=True
) when you need to trace back to specific parts of the document that contained the numerical values or were used to calculate derived values.Use
singular_occurrence=True
to enforce only a single numerical value extraction. This is particularly useful for concepts that should yield a unique value, such as “total contract value” or “effective interest rate,” rather than identifying multiple numerical values throughout the document.