Optimizing for Speed

Optimizing for Speed#

For large-scale processing or time-sensitive applications, optimize your pipeline for speed:

🚀 Enable and Configure Concurrency: Process multiple extractions concurrently. Adjust the async limiter to adapt to your LLM API setup.
📦 Use Smaller Models: Select smaller/distilled LLMs that perform faster. (See Choosing the Right LLM(s) for guidance on choosing the right model.)
🔄 Use a Fallback LLM: Configure a fallback LLM to retry extractions that failed due to rate limits.
⚙️ Use Default Parameters: All the extractions will be processed in as few LLM calls as possible.
📉 Enable Justifications Only When Necessary: Do not use justifications for simple aspects or concepts. This will reduce the number of tokens generated.
⚠️ Use Sentence-Level Reference Depth Sparingly: Only use sentence-level reference depth for aspects or concepts when absolutely necessary, as it requires loading a SaT model and running sentence segmentation on text, which can be slow for long documents.

Example of optimizing extraction for speed#

# Example of optimizing extraction for speed

import os

from aiolimiter import AsyncLimiter

from contextgem import Document, DocumentLLM


# Define document
document = Document(
    raw_text="document_text",
    # aspects=[Aspect(...), ...],
    # concepts=[Concept(...), ...],
)

# Define LLM with a fallback model
llm = DocumentLLM(
    model="openai/gpt-4o-mini",
    api_key=os.environ.get("CONTEXTGEM_OPENAI_API_KEY"),
    fallback_llm=DocumentLLM(
        model="openai/gpt-3.5-turbo",
        api_key=os.environ.get("CONTEXTGEM_OPENAI_API_KEY"),
        is_fallback=True,
    ),
)
llm.async_limiter = AsyncLimiter(
    10, 5
)  # e.g. 10 acquisitions per 5-second period; adjust to your LLM API setup
llm.fallback_llm.async_limiter = AsyncLimiter(  # type: ignore
    20, 5
)  # e.g. 20 acquisitions per 5-second period; adjust to your LLM API setup


# Use the LLM for extraction with concurrency enabled
llm.extract_all(document, use_concurrency=True)

# ... use the extracted data ...