Optimizing for Speed#
For large-scale processing or time-sensitive applications, optimize your pipeline for speed:
🚀 Enable and Configure Concurrency: Process multiple extractions concurrently. Adjust the async limiter to adapt to your LLM API setup.
📦 Use Smaller Models: Select smaller/distilled LLMs that perform faster. (See Choosing the Right LLM(s) for guidance on choosing the right model.)
🔄 Use a Fallback LLM: Configure a fallback LLM to retry extractions that failed due to rate limits.
⚙️ Use Default Parameters: All the extractions will be processed in as few LLM calls as possible.
📉 Enable Justifications Only When Necessary: Do not use justifications for simple aspects or concepts. This will reduce the number of tokens generated.
Example of optimizing extraction for speed#
# Example of optimizing extraction for speed
import os
from aiolimiter import AsyncLimiter
from contextgem import Document, DocumentLLM
# Define document
document = Document(
raw_text="document_text",
# aspects=[Aspect(...), ...],
# concepts=[Concept(...), ...],
)
# Define LLM with a fallback model
llm = DocumentLLM(
model="openai/gpt-4o-mini",
api_key=os.environ.get("CONTEXTGEM_OPENAI_API_KEY"),
async_limiter=AsyncLimiter(
10, 5
), # e.g. 10 acquisitions per 5-second period; adjust to your LLM API setup
fallback_llm=DocumentLLM(
model="openai/gpt-3.5-turbo",
api_key=os.environ.get("CONTEXTGEM_OPENAI_API_KEY"),
is_fallback=True,
async_limiter=AsyncLimiter(
20, 5
), # e.g. 20 acquisitions per 5-second period; adjust to your LLM API setup
),
)
# Use the LLM for extraction with concurrency enabled
llm.extract_all(document, use_concurrency=True)
# ... use the extracted data ...