BooleanConcept#

BooleanConcept is a specialized concept type that evaluates document content and produces True/False assessments based on specific criteria, conditions, or properties you define.

📝 Overview#

BooleanConcept is used when you need to determine if a document contains or satisfies specific attributes, properties, or conditions that can be represented as True or False values, such as:

  • Presence checks: contains confidential information, includes specific clauses, mentions certain topics

  • Compliance assessments: meets regulatory requirements, follows specific formatting standards

  • Binary classifications: is favorable/unfavorable, is complete/incomplete, is approved/rejected

đź’» Usage Example#

Here’s a simple example of how to use BooleanConcept to determine if a document mentions confidential information:

# ContextGem: BooleanConcept Extraction

import os

from contextgem import BooleanConcept, Document, DocumentLLM

# Create a Document object from text
doc = Document(
    raw_text="This document contains confidential information and should not be shared publicly."
)

# Define a BooleanConcept to detect confidential content
confidentiality_concept = BooleanConcept(
    name="Is confidential",
    description="Whether the document contains confidential information",
)

# Attach the concept to the document
doc.add_concepts([confidentiality_concept])

# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
    model="azure/gpt-4.1-mini",
    api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
    api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)

# Extract the concept from the document
confidentiality_concept = llm.extract_concepts_from_document(doc)[0]

# Print the extracted value
print(confidentiality_concept.extracted_items[0].value)  # Output: True
# Or access the extracted value from the document object
print(doc.concepts[0].extracted_items[0].value)  # Output: True
Open In Colab

⚙️ Parameters#

When creating a BooleanConcept, you can specify the following parameters:

Parameter

Type

Description

name

str

A unique name identifier for the concept

description

str

A clear description of what condition or property the concept evaluates and the criteria for determining true or false values

llm_role

str

The role of the LLM responsible for extracting the concept. Available values: "extractor_text", "reasoner_text", "extractor_vision", "reasoner_vision". Defaults to "extractor_text". For more details, see 🏷️ LLM Roles.

add_justifications

bool

Whether to include justifications for extracted items (defaults to False). Justifications provide explanations of why the LLM extracted specific values and the reasoning behind the extraction, which is especially useful for complex extractions or when debugging results.

justification_depth

str

Justification detail level. Available values: "brief", "balanced", "comprehensive". Defaults to "brief"

justification_max_sents

int

Maximum sentences in a justification (defaults to 2)

add_references

bool

Whether to include source references for extracted items (defaults to False). References indicate the specific locations in the document where evidence supporting the boolean determination was found, helping to trace back the true/false value to relevant content that influenced the decision.

reference_depth

str

Source reference granularity. Available values: "paragraphs", "sentences". Defaults to "paragraphs"

singular_occurrence

bool

Whether this concept is restricted to having only one extracted item. If True, only a single extracted item will be extracted. Defaults to False (multiple extracted items are allowed). For boolean concepts, this parameter is particularly useful when you want to make a single true/false determination about the entire document (e.g., “contains confidential information”) or a unique determination about a specific aspect (e.g., “is the payment schedule finalized”). This helps distinguish between evaluating overall document properties versus identifying multiple instances where a condition might be true/false. Note that with advanced LLMs, this constraint may not be required as they can often infer the appropriate number of items to extract from the concept’s name, description, and type (e.g., “contains confidential information” vs “compliance violations”).

custom_data

dict

Optional. Dictionary for storing any additional data that you want to associate with the concept. This data must be JSON-serializable. This data is not used for extraction but can be useful for custom processing or downstream tasks. Defaults to an empty dictionary.

🚀 Advanced Usage#

🔍 References and Justifications for Extraction#

You can configure a BooleanConcept to include justifications and references. Justifications help explain the reasoning behind true/false determinations, while references point to the specific parts of the document that influenced the decision:

# ContextGem: BooleanConcept Extraction with References and Justifications

import os

from contextgem import BooleanConcept, Document, DocumentLLM

# Sample document text containing policy information
policy_text = """
Company Data Retention Policy (Updated 2024)

All customer data must be encrypted at rest and in transit using industry-standard encryption protocols.
Personal information should be retained for no longer than 3 years after the customer relationship ends.
Employees are required to complete data privacy training annually.
"""

# Create a Document from the text
doc = Document(raw_text=policy_text)

# Create a BooleanConcept with justifications and references enabled
compliance_concept = BooleanConcept(
    name="Has encryption requirement",
    description="Whether the document specifies that data must be encrypted",
    add_justifications=True,  # Enable justifications to understand reasoning
    justification_depth="brief",
    justification_max_sents=1,  # Allow up to 1 sentences for each justification
    add_references=True,  # Include references to source text
    reference_depth="sentences",  # Reference specific sentences rather than paragraphs
)

# Attach the concept to the document
doc.add_concepts([compliance_concept])

# Configure DocumentLLM with your API parameters
llm = DocumentLLM(
    model="azure/gpt-4o-mini",
    api_key=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_VERSION"),
    api_base=os.getenv("CONTEXTGEM_AZURE_OPENAI_API_BASE"),
)

# Extract the concept
compliance_concept = llm.extract_concepts_from_document(doc)[0]

# Print the extracted value with justification and references
print(f"Has encryption requirement: {compliance_concept.extracted_items[0].value}")
print(f"\nJustification: {compliance_concept.extracted_items[0].justification}")
print("\nSource references:")
for sent in compliance_concept.extracted_items[0].reference_sentences:
    print(f"- {sent.raw_text}")
Open In Colab

📊 Extracted Items#

When a BooleanConcept is extracted, it is populated with a list of extracted items accessible through the .extracted_items property. Each item is an instance of the _BooleanItem class with the following attributes:

Attribute

Type

Description

value

bool

The extracted boolean value (True or False)

justification

str

Explanation of why this boolean value was determined (only if add_justifications=True)

reference_paragraphs

list[Paragraph]

List of paragraph objects that influenced the boolean determination (only if add_references=True)

reference_sentences

list[Sentence]

List of sentence objects that influenced the boolean determination (only if add_references=True and reference_depth="sentences")

đź’ˇ Best Practices#

Here are some best practices to optimize your use of BooleanConcept:

  • Provide a clear and specific description that helps the LLM understand exactly what condition to evaluate, using precise and unambiguous language in your concept names and descriptions. Since boolean concepts yield true/false values, focus on describing what criteria should be used to make the determination (e.g., “whether the document mentions specific compliance requirements” rather than just “compliance requirements”). Avoid vague terms that could be interpreted multiple ways—for example, use “contains legally binding obligations” instead of “contains important content” to ensure consistent and accurate determinations.

  • Break down complex conditions into multiple simpler boolean concepts when appropriate. Instead of one concept checking “document is complete and compliant and approved,” consider separate concepts for each condition. This provides more granular insights and makes it easier to identify specific issues when any condition fails.

  • Enable justifications (using add_justifications=True) when you need to understand the reasoning behind the LLM’s true/false determination.

  • Enable references (using add_references=True) when you need to trace back to specific parts of the document that influenced the boolean decision or verify the evidence used to make the determination.

  • Use singular_occurrence=True to enforce only a single boolean determination for the entire document. This is particularly useful for concepts that should yield a single true/false answer, such as “contains confidential information” or “is compliant with regulations,” rather than identifying multiple instances where the condition might be true or false throughout the document.