BooleanConcept#
BooleanConcept
is a specialized concept type that evaluates document content and produces True/False assessments based on specific criteria, conditions, or properties you define.
📝 Overview#
BooleanConcept
is used when you need to determine if a document contains or satisfies specific attributes, properties, or conditions that can be represented as True or False values, such as:
Presence checks: contains confidential information, includes specific clauses, mentions certain topics
Compliance assessments: meets regulatory requirements, follows specific formatting standards
Binary classifications: is favorable/unfavorable, is complete/incomplete, is approved/rejected
đź’» Usage Example#
Here’s a simple example of how to use BooleanConcept
to determine if a document mentions confidential information:
# ContextGem: BooleanConcept Extraction
import os
from contextgem import BooleanConcept, Document, DocumentLLM
# Create a Document object from text
doc = Document(
raw_text="This document contains confidential information and should not be shared publicly."
)
# Define a BooleanConcept to detect confidential content
confidentiality_concept = BooleanConcept(
name="Is confidential",
description="Whether the document contains confidential information",
)
# Attach the concept to the document
doc.add_concepts([confidentiality_concept])
# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
model="azure/gpt-4.1-mini",
api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)
# Extract the concept from the document
confidentiality_concept = llm.extract_concepts_from_document(doc)[0]
# Print the extracted value
print(confidentiality_concept.extracted_items[0].value) # Output: True
# Or access the extracted value from the document object
print(doc.concepts[0].extracted_items[0].value) # Output: True
⚙️ Parameters#
When creating a BooleanConcept
, you can specify the following parameters:
Parameter |
Type |
Description |
---|---|---|
|
str |
A unique name identifier for the concept |
|
str |
A clear description of what condition or property the concept evaluates and the criteria for determining true or false values |
|
str |
The role of the LLM responsible for extracting the concept. Available values: |
|
bool |
Whether to include justifications for extracted items (defaults to |
|
str |
Justification detail level. Available values: |
|
int |
Maximum sentences in a justification (defaults to |
|
bool |
Whether to include source references for extracted items (defaults to |
|
str |
Source reference granularity. Available values: |
|
bool |
Whether this concept is restricted to having only one extracted item. If |
|
dict |
Optional. Dictionary for storing any additional data that you want to associate with the concept. This data must be JSON-serializable. This data is not used for extraction but can be useful for custom processing or downstream tasks. Defaults to an empty dictionary. |
🚀 Advanced Usage#
🔍 References and Justifications for Extraction#
You can configure a BooleanConcept
to include justifications and references. Justifications help explain the reasoning behind true/false determinations, while references point to the specific parts of the document that influenced the decision:
# ContextGem: BooleanConcept Extraction with References and Justifications
import os
from contextgem import BooleanConcept, Document, DocumentLLM
# Sample document text containing policy information
policy_text = """
Company Data Retention Policy (Updated 2024)
All customer data must be encrypted at rest and in transit using industry-standard encryption protocols.
Personal information should be retained for no longer than 3 years after the customer relationship ends.
Employees are required to complete data privacy training annually.
"""
# Create a Document from the text
doc = Document(raw_text=policy_text)
# Create a BooleanConcept with justifications and references enabled
compliance_concept = BooleanConcept(
name="Has encryption requirement",
description="Whether the document specifies that data must be encrypted",
add_justifications=True, # Enable justifications to understand reasoning
justification_depth="brief",
justification_max_sents=1, # Allow up to 1 sentences for each justification
add_references=True, # Include references to source text
reference_depth="sentences", # Reference specific sentences rather than paragraphs
)
# Attach the concept to the document
doc.add_concepts([compliance_concept])
# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
model="azure/gpt-4o-mini",
api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)
# Extract the concept
compliance_concept = llm.extract_concepts_from_document(doc)[0]
# Print the extracted value with justification and references
print(f"Has encryption requirement: {compliance_concept.extracted_items[0].value}")
print(f"\nJustification: {compliance_concept.extracted_items[0].justification}")
print("\nSource references:")
for sent in compliance_concept.extracted_items[0].reference_sentences:
print(f"- {sent.raw_text}")
📊 Extracted Items#
When a BooleanConcept
is extracted, it is populated with a list of extracted items accessible through the .extracted_items
property. Each item is an instance of the _BooleanItem
class with the following attributes:
Attribute |
Type |
Description |
---|---|---|
|
bool |
The extracted boolean value (True or False) |
|
str |
Explanation of why this boolean value was determined (only if |
|
list[ |
List of paragraph objects that influenced the boolean determination (only if |
|
list[ |
List of sentence objects that influenced the boolean determination (only if |
đź’ˇ Best Practices#
Here are some best practices to optimize your use of BooleanConcept
:
Provide a clear and specific description that helps the LLM understand exactly what condition to evaluate, using precise and unambiguous language in your concept names and descriptions. Since boolean concepts yield true/false values, focus on describing what criteria should be used to make the determination (e.g., “whether the document mentions specific compliance requirements” rather than just “compliance requirements”). Avoid vague terms that could be interpreted multiple ways—for example, use “contains legally binding obligations” instead of “contains important content” to ensure consistent and accurate determinations.
Break down complex conditions into multiple simpler boolean concepts when appropriate. Instead of one concept checking “document is complete and compliant and approved,” consider separate concepts for each condition. This provides more granular insights and makes it easier to identify specific issues when any condition fails.
Enable justifications (using
add_justifications=True
) when you need to understand the reasoning behind the LLM’s true/false determination.Enable references (using
add_references=True
) when you need to trace back to specific parts of the document that influenced the boolean decision or verify the evidence used to make the determination.Use
singular_occurrence=True
to enforce only a single boolean determination for the entire document. This is particularly useful for concepts that should yield a single true/false answer, such as “contains confidential information” or “is compliant with regulations,” rather than identifying multiple instances where the condition might be true or false throughout the document.