Optimizing for Speed#
For large-scale processing or time-sensitive applications, optimize your pipeline for speed:
🚀 Enable and Configure Concurrency: Process multiple extractions concurrently. Adjust the async limiter to adapt to your LLM API setup.
📦 Use Smaller Models: Select smaller/distilled LLMs that perform faster. (See Choosing the Right LLM(s) for guidance on choosing the right model.)
🔄 Use a Fallback LLM: Configure a fallback LLM to retry extractions that failed due to rate limits.
⚙️ Use Default Parameters: All the extractions will be processed in as few LLM calls as possible.
📉 Enable Justifications Only When Necessary: Do not use justifications for simple aspects or concepts. This will reduce the number of tokens generated.
⚠️ Use Sentence-Level Reference Depth Sparingly: Only use sentence-level reference depth for aspects or concepts when absolutely necessary, as it requires loading a SaT model and running sentence segmentation on text, which can be slow for long documents.
# Example of optimizing extraction for speed
import os
from aiolimiter import AsyncLimiter
from contextgem import Document, DocumentLLM
# Define document
document = Document(
raw_text="document_text",
# aspects=[Aspect(...), ...],
# concepts=[Concept(...), ...],
)
# Define LLM with a fallback model
llm = DocumentLLM(
model="openai/gpt-4o-mini",
api_key=os.environ.get("CONTEXTGEM_OPENAI_API_KEY"),
fallback_llm=DocumentLLM(
model="openai/gpt-3.5-turbo",
api_key=os.environ.get("CONTEXTGEM_OPENAI_API_KEY"),
is_fallback=True,
),
)
llm.async_limiter = AsyncLimiter(
10, 5
) # e.g. 10 acquisitions per 5-second period; adjust to your LLM API setup
llm.fallback_llm.async_limiter = AsyncLimiter( # type: ignore
20, 5
) # e.g. 20 acquisitions per 5-second period; adjust to your LLM API setup
# Use the LLM for extraction with concurrency enabled
llm.extract_all(document, use_concurrency=True)
# ... use the extracted data ...