Aspect Extraction#
Aspect
is a fundamental component of ContextGem that represents a defined area or topic within a document that requires focused attention. Aspects help identify and extract specific sections or themes from documents according to predefined criteria.
📝 Overview#
Aspects serve as containers for organizing and structuring document content extraction. They allow you to:
Extract document sections: Identify and extract specific parts of documents (e.g., contract clauses, report sections, policy terms)
Organize content hierarchically: Create sub-aspects to break down complex topics into logical components
Define extraction scope: Focus on specific areas of interest before applying detailed concept extraction
While concepts extract specific data points, aspects extract entire sections or topics from documents, providing context for subsequent detailed analysis.
⭐ Key Features#
Hierarchical Organization#
Aspects support nested structures through sub-aspects, allowing you to break down complex topics:
Parent aspects represent broad topics (e.g., “Termination Clauses”)
Sub-aspects represent specific components (e.g., “Notice Period”, “Severance Terms”, “Company Rights”)
Integration with Concepts#
Aspects can contain _Concept
instances for detailed data extraction within the identified sections, creating a two-stage extraction workflow.
Note
See supported concept types in Supported Concepts. All public concept types inherit from the internal _Concept
base class.
💻 Basic Usage#
Simple Aspect Extraction#
Here’s how to extract a specific section from a document:
# ContextGem: Aspect Extraction
import os
from contextgem import Aspect, Document, DocumentLLM
# Create a document instance
doc = Document(
raw_text=(
"Software License Agreement\n"
"This software license agreement (Agreement) is entered into between Tech Corp (Licensor) and Client Corp (Licensee).\n"
"...\n"
"2. Term and Termination\n"
"This Agreement shall commence on the Effective Date and shall continue for a period of three (3) years, "
"unless earlier terminated in accordance with the provisions hereof. Either party may terminate this Agreement "
"upon thirty (30) days written notice to the other party.\n"
"\n"
"3. Payment Terms\n"
"Licensee agrees to pay Licensor an annual license fee of $10,000, payable within thirty (30) days of the "
"invoice date. Late payments shall incur a penalty of 1.5% per month.\n"
"...\n"
),
)
# Define an aspect to extract the termination clause
termination_aspect = Aspect(
name="Termination Clauses",
description="Sections describing how and when the agreement can be terminated, including notice periods and conditions",
)
# Add the aspect to the document
doc.add_aspects([termination_aspect])
# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
model="azure/gpt-4.1-mini",
api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)
# Extract the aspect from the document
termination_aspect = llm.extract_aspects_from_document(doc)[0]
# Access the extracted information
print("Extracted Termination Clauses:")
for item in termination_aspect.extracted_items:
print(f"- {item.value}")
Aspect with Sub-Aspects#
Breaking down complex topics into components:
# ContextGem: Aspect Extraction with Sub-Aspects
import os
from contextgem import Aspect, Document, DocumentLLM
# Create a document instance
doc = Document(
raw_text=(
"Employment Agreement\n"
"This Employment Agreement is entered into between Global Tech Inc. (Company) and John Smith (Employee).\n"
"\n"
"Section 8: Termination\n"
"8.1 Termination by Company\n"
"The Company may terminate this agreement at any time with or without cause by providing thirty (30) days "
"written notice to the Employee. In case of termination for cause, no notice period is required.\n"
"\n"
"8.2 Termination by Employee\n"
"The Employee may terminate this agreement by providing fourteen (14) days written notice to the Company. "
"The Employee must complete all pending assignments before the termination date.\n"
"\n"
"8.3 Severance Benefits\n"
"Upon termination without cause, the Employee shall receive severance pay equal to two (2) weeks of base salary "
"for each year of service, with a minimum of four (4) weeks and a maximum of twenty-six (26) weeks. "
"Severance benefits are contingent upon signing a release agreement.\n"
"\n"
"8.4 Return of Company Property\n"
"Upon termination, the Employee must immediately return all Company property, including laptops, access cards, "
"confidential documents, and any other materials belonging to the Company.\n"
"\n"
"Section 9: Non-Competition\n"
"The Employee agrees not to engage in any business that competes with the Company for a period of twelve (12) "
"months following termination of employment within a 50-mile radius of the Company's headquarters.\n"
),
)
# Define the main termination aspect with sub-aspects
termination_aspect = Aspect(
name="Termination Provisions",
description="All provisions related to employment termination including conditions, procedures, and consequences",
aspects=[
Aspect(
name="Company Termination Rights",
description="Conditions and procedures for the company to terminate the employee, including notice periods and cause requirements",
),
Aspect(
name="Employee Termination Rights",
description="Conditions and procedures for the employee to terminate employment, including notice requirements and obligations",
),
Aspect(
name="Severance Benefits",
description="Compensation and benefits provided to the employee upon termination, including calculation methods and conditions",
),
Aspect(
name="Post-Termination Obligations",
description="Employee obligations that continue after termination, including property return and non-competition requirements",
),
],
)
# Add the aspect to the document
doc.add_aspects([termination_aspect])
# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
model="azure/gpt-4.1-mini",
api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)
# Extract aspects from the document
termination_aspect = llm.extract_aspects_from_document(doc)[0]
# Access the extracted information
print("All Termination Provisions:")
for item in termination_aspect.extracted_items:
print(f"- {item.value}")
print("\nSub-Aspects:")
for sub_aspect in termination_aspect.aspects:
print(f"\n{sub_aspect.name}:")
for item in sub_aspect.extracted_items:
print(f"- {item.value}")
⚙️ Parameters#
When creating an Aspect
, you can configure the following parameters:
Parameter |
Type |
Description |
---|---|---|
|
str |
A unique name identifier for the aspect. Must be unique among sibling aspects. |
|
str |
A detailed description of what the aspect represents and what content should be extracted. Must be unique among sibling aspects. |
|
list[ |
Optional. List of sub-aspects for hierarchical organization. Limited to one nesting level. |
|
list[ |
Optional. List of concepts associated with the aspect for detailed data extraction within the aspect’s scope. See supported concept types in Supported Concepts. |
|
str |
The role of the LLM responsible for aspect extraction. Available values: |
|
str |
The structural depth of references. Available values: |
|
bool |
Whether the LLM will output justification for each extracted item (defaults to |
|
str |
The level of detail for justifications. Available values: |
|
int |
Maximum number of sentences in a justification (defaults to |
📊 Extracted Items#
When an Aspect
is extracted, it is populated with a list of extracted items accessible through the .extracted_items
property. Each item is an instance of the _StringItem
class with the following attributes:
Attribute |
Type |
Description |
---|---|---|
|
str |
The extracted text segment representing the aspect |
|
str |
Explanation of why this text segment was identified as relevant to the aspect (only if |
|
list[ |
List of paragraph objects that contain the extracted aspect content (always populated for aspect’s extracted items) |
|
list[ |
List of sentence objects that contain the extracted aspect content (only if |
🚀 Advanced Usage#
Aspects with Concepts#
Combining aspect extraction with detailed concept extraction:
# ContextGem: Aspect Extraction with Concepts
import os
from contextgem import Aspect, Document, DocumentLLM, NumericalConcept, StringConcept
# Create a document instance
doc = Document(
raw_text=(
"Service Agreement\n"
"This Service Agreement is between DataFlow Solutions (Provider) and Enterprise Corp (Client).\n"
"\n"
"3. Payment Terms\n"
"3.1 Service Fees\n"
"The Client shall pay the Provider a monthly service fee of $5,000 for basic services. "
"Additional premium features are available for an extra $1,200 per month. "
"Setup fee is a one-time payment of $2,500.\n"
"\n"
"3.2 Payment Schedule\n"
"All payments are due within 15 business days of invoice receipt. "
"Invoices will be sent on the first day of each month for the upcoming service period. "
"Late payments will incur a penalty of 2% per month on the outstanding balance.\n"
"\n"
"3.3 Payment Methods\n"
"Payments may be made by bank transfer, corporate check, or ACH. "
"Credit card payments are accepted for amounts under $1,000 with a 3% processing fee. "
"Wire transfer fees are the responsibility of the Client.\n"
"\n"
"3.4 Refund Policy\n"
"Services are non-refundable once delivered. However, if services are terminated "
"with 30 days notice, any prepaid fees for future periods will be refunded on a pro-rata basis.\n"
),
)
# Define an aspect with associated concepts
payment_aspect = Aspect(
name="Payment Terms",
description="All clauses and provisions related to payment, including fees, schedules, methods, and policies",
concepts=[
NumericalConcept(
name="Monthly Service Fee",
description="The regular monthly fee for basic services",
numeric_type="float",
),
NumericalConcept(
name="Premium Features Fee",
description="Additional monthly fee for premium features",
numeric_type="float",
),
NumericalConcept(
name="Setup Fee",
description="One-time initial setup or onboarding fee",
numeric_type="float",
),
NumericalConcept(
name="Payment Due Days",
description="Number of days the client has to make payment after receiving invoice",
numeric_type="int",
),
NumericalConcept(
name="Late Payment Penalty Rate",
description="Percentage penalty charged per month for late payments",
numeric_type="float",
),
StringConcept(
name="Accepted Payment Methods",
description="List of payment methods that are accepted by the provider",
),
StringConcept(
name="Refund Policy",
description="Conditions and procedures for refunds or credits",
),
],
)
# Add the aspect to the document
doc.add_aspects([payment_aspect])
# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
model="azure/gpt-4.1-mini",
api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)
# Extract aspects and their concepts from the document
doc = llm.extract_all(doc)
# Access the extracted payment terms aspect and concepts
payment_terms_aspect = doc.get_aspect_by_name("Payment Terms")
print("Extracted Payment Terms Section:")
for item in payment_terms_aspect.extracted_items:
print(f"- {item.value}")
print("\nExtracted Payment Details:")
for concept in payment_terms_aspect.concepts:
print(f"\n{concept.name}:")
for item in concept.extracted_items:
print(f"- {item.value}")
# Access specific extracted values
monthly_fee = payment_terms_aspect.get_concept_by_name("Monthly Service Fee")
print(f"\nMonthly Service Fee: ${monthly_fee.extracted_items[0].value}")
Complex Hierarchical Structure#
Creating a comprehensive document analysis structure with aspects, sub-aspects and concepts:
# ContextGem: Complex Hierarchical Aspect Extraction with Sub-Aspects and Concepts
import os
from contextgem import (
Aspect,
BooleanConcept,
Document,
DocumentLLM,
NumericalConcept,
StringConcept,
)
# Create a document instance
doc = Document(
raw_text=(
"Software Development and Licensing Agreement\n"
"\n"
"1. Intellectual Property Rights\n"
"1.1 Ownership of Developed Software\n"
"All software developed under this Agreement shall remain the exclusive property of the Developer. "
"The Client receives a non-exclusive license to use the software as specified in Section 2.\n"
"\n"
"1.2 Client Data and Content\n"
"The Client retains all rights to data and content provided to the Developer. "
"The Developer may not use Client data for any purpose other than fulfilling this Agreement.\n"
"\n"
"1.3 Third-Party Components\n"
"The software may include third-party open-source components. The Client agrees to comply "
"with all applicable open-source licenses.\n"
"\n"
"2. License Terms\n"
"2.1 Grant of License\n"
"Developer grants Client a perpetual, non-transferable license to use the software "
"for internal business purposes only, limited to 100 concurrent users.\n"
"\n"
"2.2 License Restrictions\n"
"Client may not redistribute, sublicense, or create derivative works. "
"Reverse engineering is prohibited except as required by law.\n"
"\n"
"3. Payment and Financial Terms\n"
"3.1 Development Fees\n"
"Total development fee is $150,000, payable in three installments: "
"$50,000 upon signing, $50,000 at 50% completion, and $50,000 upon delivery.\n"
"\n"
"3.2 Ongoing License Fees\n"
"Annual license fee of $12,000 is due each year starting from the first anniversary. "
"Fees may increase by up to 5% annually with 60 days notice.\n"
"\n"
"3.3 Payment Terms\n"
"All payments due within 30 days of invoice. Late payments incur 1.5% monthly penalty.\n"
"\n"
"4. Liability and Risk Allocation\n"
"4.1 Limitation of Liability\n"
"Developer's total liability shall not exceed the total amount paid under this Agreement. "
"Neither party shall be liable for indirect, consequential, or punitive damages.\n"
"\n"
"4.2 Indemnification\n"
"Client agrees to indemnify Developer against third-party claims arising from Client's use "
"of the software, except for claims related to Developer's IP infringement.\n"
"\n"
"4.3 Insurance Requirements\n"
"Developer shall maintain professional liability insurance of at least $1,000,000. "
"Client shall maintain general liability insurance of at least $2,000,000.\n"
),
)
# Define a complex hierarchical structure
contract_aspects = [
Aspect(
name="Intellectual Property Provisions",
description="All provisions related to intellectual property rights, ownership, and usage",
aspects=[
Aspect(
name="Software Ownership",
description="Clauses defining who owns the developed software and related IP rights",
concepts=[
StringConcept(
name="Software Owner",
description="The party that owns the developed software",
),
BooleanConcept(
name="Exclusive Ownership",
description="Whether the ownership is exclusive to one party",
),
],
),
Aspect(
name="Client Data Rights",
description="Provisions about client data ownership and developer's permitted use",
concepts=[
StringConcept(
name="Data Usage Restrictions",
description="Limitations on how developer can use client data",
),
],
),
Aspect(
name="Third-Party Components",
description="Terms regarding use of third-party or open-source components",
concepts=[
BooleanConcept(
name="Open Source Included",
description="Whether the software includes open-source components",
),
],
),
],
),
Aspect(
name="License Grant and Restrictions",
description="Terms defining the software license granted to the client and any restrictions",
aspects=[
Aspect(
name="License Scope",
description="The extent and limitations of the license granted",
concepts=[
StringConcept(
name="License Type",
description="The type of license granted (exclusive, non-exclusive, etc.)",
),
NumericalConcept(
name="User Limit",
description="Maximum number of concurrent users allowed",
numeric_type="int",
),
BooleanConcept(
name="Perpetual License",
description="Whether the license is perpetual or time-limited",
),
],
),
Aspect(
name="Usage Restrictions",
description="Prohibited uses and activities under the license",
concepts=[
BooleanConcept(
name="Redistribution Allowed",
description="Whether client can redistribute the software",
),
BooleanConcept(
name="Derivative Works Allowed",
description="Whether client can create derivative works",
),
],
),
],
),
Aspect(
name="Financial Terms",
description="All payment-related provisions including fees, schedules, and penalties",
concepts=[
NumericalConcept(
name="Total Development Fee",
description="The total amount for software development",
numeric_type="float",
),
NumericalConcept(
name="Annual License Fee",
description="Yearly fee for using the software",
numeric_type="float",
),
NumericalConcept(
name="Payment Due Days",
description="Number of days to make payment after invoice",
numeric_type="int",
),
],
),
Aspect(
name="Risk and Liability Management",
description="Provisions for managing risks, liability limitations, and insurance requirements",
aspects=[
Aspect(
name="Liability Limitations",
description="Caps and exclusions on each party's liability",
concepts=[
StringConcept(
name="Liability Cap",
description="Maximum amount of liability for each party",
),
StringConcept(
name="Excluded Damages",
description="Types of damages that are excluded from liability",
),
],
),
Aspect(
name="Insurance Requirements",
description="Required insurance coverage for each party",
concepts=[
NumericalConcept(
name="Developer Insurance Amount",
description="Minimum professional liability insurance for developer",
numeric_type="float",
),
NumericalConcept(
name="Client Insurance Amount",
description="Minimum general liability insurance for client",
numeric_type="float",
),
],
),
],
),
]
# Add all aspects to the document
doc.add_aspects(contract_aspects)
# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
model="azure/gpt-4.1",
api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)
# Extract aspects and concepts
doc = llm.extract_all(doc)
# Access the hierarchical extraction results
print("=== CONTRACT ANALYSIS RESULTS ===\n")
for main_aspect in doc.aspects:
print(f"{main_aspect.name.upper()}")
for item in main_aspect.extracted_items:
print(f"- {item.value}")
# Access main aspect concepts
if main_aspect.concepts:
print(" Main Aspect Concepts:")
for concept in main_aspect.concepts:
print(f" • {concept.name}:")
for item in concept.extracted_items:
print(f" - {item.value}")
# Access sub-aspects
if main_aspect.aspects:
print(" Sub-Aspects:")
for sub_aspect in main_aspect.aspects:
print(f" {sub_aspect.name}")
for item in sub_aspect.extracted_items:
print(f" - {item.value}")
# Access sub-aspect concepts
if sub_aspect.concepts:
print(" Sub-Aspect Concepts:")
for concept in sub_aspect.concepts:
print(f" • {concept.name}:")
for item in concept.extracted_items:
print(f" - {item.value}")
print()
Justifications for Extraction#
Justifications provide explanations for why specific text segments were identified as relevant to an aspect. Justifications help users understand the reasoning behind extractions and evaluate their relevance. When enabled, each extracted item includes a generated explanation of why that text segment was considered part of the aspect.
Example:
# ContextGem: Aspect Extraction with Justifications
import os
from contextgem import Aspect, Document, DocumentLLM
# Create a document instance
doc = Document(
raw_text=(
"NON-DISCLOSURE AGREEMENT\n"
"\n"
'This Non-Disclosure Agreement ("Agreement") is entered into between TechCorp Inc. '
'("Disclosing Party") and Innovation Labs LLC ("Receiving Party") on January 15, 2024.\n'
"...\n"
),
)
# Define a single aspect focused on NDA direction with justifications
nda_direction_aspect = Aspect(
name="NDA Direction",
description="Provisions informing the NDA direction (whether mutual or one-way) and information flow between parties",
add_justifications=True,
justification_depth="balanced",
justification_max_sents=4,
)
# Add the aspect to the document
doc.aspects = [nda_direction_aspect]
# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
model="azure/gpt-4.1-mini",
api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)
# Extract the aspect with justifications
nda_direction_aspect = llm.extract_aspects_from_document(doc)[0]
for i, item in enumerate(nda_direction_aspect.extracted_items, 1):
print(f"- {i}. {item.value}")
print(f" Justification: {item.justification}")
print()
Note
References are always included for aspects. The reference_paragraphs
field is automatically populated in extracted items of aspects, as they represent existing text segments in the document. The reference_sentences
field is only populated when reference_depth
is set to "sentences"
. You can access these references as follows:
# Always available for aspects
aspect.extracted_items[0].reference_paragraphs
# Only populated if reference_depth="sentences"
aspect.extracted_items[0].reference_sentences
💡 Best Practices#
Aspect Definition#
Be specific: Provide clear, detailed descriptions that help the LLM understand exactly what content constitutes the aspect
Use domain terminology: Include relevant domain-specific terms that help identify the target content
Define scope clearly: Specify what should and shouldn’t be included in the aspect
Structuring Complex Content#
Logical decomposition: Break down complex topics into logical, non-overlapping components
Meaningful relationships: Ensure sub-aspects and/or concepts genuinely belong to their parent aspect
Integration Strategy#
Two-stage extraction: Use aspects to identify relevant sections first, then apply sub-aspects and/or concepts for detailed data extraction
Scope alignment: Ensure sub-aspects and/or concepts are relevant to their containing aspects
Reference tracking: Enable references when you need to trace extracted data back to source locations
🎯 Example Use Cases#
These are examples of how aspects may be used in different domains:
Contract Analysis#
Termination Clauses: Extract and analyze termination conditions, notice periods, and severance terms
Payment Terms: Identify payment schedules, amounts, and conditions
Liability Sections: Extract liability caps, limitations, and indemnification clauses
Intellectual Property: Identify IP ownership, licensing, and usage rights
Financial Reports#
Revenue Sections: Extract revenue recognition, breakdown by segments, and growth analysis
Compliance Sections: Identify regulatory compliance statements and audit findings
Key Performance Indicators: Extract precise numerical metrics like EBITDA margins, debt-to-equity ratios, and year-over-year percentage changes
Technical Documentation#
Product Specifications: Extract technical requirements, features, and performance criteria
Installation Procedures: Identify setup steps, configuration requirements, and dependencies
Troubleshooting Sections: Extract problem descriptions, diagnostic steps, and solutions
API Documentation: Identify endpoints, parameters, and usage examples
Research Papers#
Methodology Sections: Extract research methods, data collection, and analysis approaches
Results Sections: Identify findings, statistical outcomes, and experimental results
Discussion Sections: Extract interpretation, implications, and future research directions