contextf

intelligent context builder. less tokens, same quality.

github eval results

// overview

contextf builds relevant context from document collections using search patterns and token-aware processing. instead of stuffing entire documents into a prompt, it extracts only the sections that matter.

two approaches:

// installation

pip install contextF
pip install contextF[pdf]    # with pdf parsing support

// quick start

from contextf import ContextBuilder

# with llm-generated patterns
builder = ContextBuilder(
    docs_path="./documents",
    max_context_tokens=200000,
    openai_api_key="your-key"
)

result = builder.build_context(
    query="what are the key findings on hallucination detection?"
)

print(f"tokens used: {result['context_tokens']}")
print(f"files matched: {len(result['files_used'])}")
print(result['context'])
# with manual patterns
result = builder.build_context(
    patterns=["hallucination", "detection method", "semantic entropy"],
    file_patterns=["*.md"]
)

// configuration

configure via json file or direct parameters:

{
  "search": {
    "docs_path": "./documents",
    "file_patterns": ["*.md", "*.txt"],
    "max_patterns_per_query": 3,
    "max_matches_per_file": 5,
    "case_sensitive": false
  },
  "tokens": {
    "max_context_tokens": 200000,
    "context_window_tokens": 8000,
    "max_file_tokens": 50000,
    "encoding": "cl100k_base"
  },
  "llm": {
    "enabled": true,
    "model": "gpt-4.1-mini",
    "temperature": 0.7
  }
}

// api reference

ContextBuilder

docs_path str — directory containing documents (default: "./documents")
max_context_tokens int — maximum total context tokens (default: 200000)
context_window_tokens int — window size around matches (default: 8000)
max_patterns_per_query int — max search patterns to generate (default: 5)
max_matches_per_file int — max matches per file (default: 5)
case_sensitive bool — case-sensitive search (default: false)
encoding str — tokenizer encoding (default: "cl100k_base")

build_context(query, patterns, docs_path, file_patterns)

returns a dictionary:

context str — merged context text from all matches
context_tokens int — total token count
files_used list — file details and contribution metadata
matches list — all pattern matches with locations

// eval results

evaluated against naive full-document context using 7 research papers and 10 queries. scored by gpt-4.1 as judge across accuracy, completeness, relevance, and clarity (1-10 each).

efficiency

quality

when to use what

contextf works best for: focused queries, latency-sensitive apps, cost optimization, targeted deep dives.

full context works best for: comprehensive literature synthesis, cross-document comparison, exhaustive coverage.

full evaluation code and results: contextf-eval

// utilities

PDFParser

from contextf.utils import PDFParser

# single pdf
PDFParser.convert_pdf_to_markdown("paper.pdf", "paper.md")

# batch conversion
PDFParser.convert_pdfs_to_markdown("./pdfs/", "./markdown/")

TokenCounter

from contextf.utils import TokenCounter

count = TokenCounter.count_tokens_in_file("document.md")
summary = TokenCounter.get_directory_summary("./documents/")
TokenCounter.print_directory_report("./documents/")