contextf
intelligent context builder. less tokens, same quality.
// overview
contextf builds relevant context from document collections using search patterns and token-aware processing. instead of stuffing entire documents into a prompt, it extracts only the sections that matter.
two approaches:
- llm-generated patterns — give it a natural language query, it generates search patterns via openai
- manual patterns — specify your own search terms for precise control
// installation
pip install contextF
pip install contextF[pdf] # with pdf parsing support
// quick start
from contextf import ContextBuilder
# with llm-generated patterns
builder = ContextBuilder(
docs_path="./documents",
max_context_tokens=200000,
openai_api_key="your-key"
)
result = builder.build_context(
query="what are the key findings on hallucination detection?"
)
print(f"tokens used: {result['context_tokens']}")
print(f"files matched: {len(result['files_used'])}")
print(result['context'])
# with manual patterns
result = builder.build_context(
patterns=["hallucination", "detection method", "semantic entropy"],
file_patterns=["*.md"]
)
// configuration
configure via json file or direct parameters:
{
"search": {
"docs_path": "./documents",
"file_patterns": ["*.md", "*.txt"],
"max_patterns_per_query": 3,
"max_matches_per_file": 5,
"case_sensitive": false
},
"tokens": {
"max_context_tokens": 200000,
"context_window_tokens": 8000,
"max_file_tokens": 50000,
"encoding": "cl100k_base"
},
"llm": {
"enabled": true,
"model": "gpt-4.1-mini",
"temperature": 0.7
}
}
// api reference
ContextBuilder
build_context(query, patterns, docs_path, file_patterns)
returns a dictionary:
// eval results
evaluated against naive full-document context using 7 research papers and 10 queries. scored by gpt-4.1 as judge across accuracy, completeness, relevance, and clarity (1-10 each).
efficiency
- token usage: 112,715 (full) vs 16,701 (contextf) — 85.2% reduction
- processing time: 46.9s vs 22.0s — 2.1x faster
quality
- contextf average: 38.0/40 (95%)
- full context average: 37.7/40 (94.3%)
- 99.2% quality retention at a fraction of the tokens
when to use what
contextf works best for: focused queries, latency-sensitive apps, cost optimization, targeted deep dives.
full context works best for: comprehensive literature synthesis, cross-document comparison, exhaustive coverage.
full evaluation code and results: contextf-eval
// utilities
PDFParser
from contextf.utils import PDFParser
# single pdf
PDFParser.convert_pdf_to_markdown("paper.pdf", "paper.md")
# batch conversion
PDFParser.convert_pdfs_to_markdown("./pdfs/", "./markdown/")
TokenCounter
from contextf.utils import TokenCounter
count = TokenCounter.count_tokens_in_file("document.md")
summary = TokenCounter.get_directory_summary("./documents/")
TokenCounter.print_directory_report("./documents/")