catch-cap
hallucination detection for llms. four methods, one api call.
// overview
catch-cap detects when llms generate confident but factually incorrect responses before they reach end users. it runs four independent detection methods and returns a 0-1 confidence score with automated corrections when hallucinations are found.
the four detection methods:
- semantic entropy — generates multiple responses to the same query, measures consistency via embeddings. high entropy = low confidence.
- log probabilities — identifies uncertain tokens during generation. flags outputs where too many tokens have low log-probs.
- web grounding — validates claims against real-world information via web search (tavily or searxng).
- llm judge — uses a separate model to verify accuracy and consistency of the response.
// installation
pip install catch-cap
set your api keys via environment variables or a .env file:
OPENAI_API_KEY=your-openai-key
GEMINI_API_KEY=your-gemini-key
GROQ_API_KEY=your-groq-key
TAVILY_API_KEY=your-tavily-key
only set the keys for providers you plan to use. at minimum, you need one model provider key.
// quick start
import asyncio
from catch_cap import CatchCap, CatchCapConfig, ModelConfig
async def main():
config = CatchCapConfig(
generator=ModelConfig(provider="openai", name="gpt-4.1-mini")
)
detector = CatchCap(config)
result = await detector.run("How many r's are there in strawberry?")
print(f"confabulation detected: {result.confabulation_detected}")
print(f"entropy score: {result.semantic_entropy.entropy_score}")
if result.corrected_answer:
print(f"corrected: {result.corrected_answer}")
asyncio.run(main())
// configuration
CatchCapConfig
ModelConfig
SemanticEntropyConfig
LogProbConfig
WebSearchConfig
JudgeConfig
// api reference
CatchCap.run(query: str) -> Result
the main async method. processes a query through the full detection pipeline.
result object
// providers
openai (recommended)
full support. all non-thinking models. embeddings via text-embedding-3-large/small. log-probs supported.
gemini
fast, cost-effective. all non-thinking models. embeddings via text-embedding-004. no log-prob support.
groq
extremely fast inference. all non-thinking models. use openai/gemini for embeddings. limited log-prob support.
mixed provider configurations are supported — you can use different providers for generation, embeddings, and judging.
// error handling
from catch_cap.exceptions import (
CatchCapError,
ProviderNotAvailableError
)
try:
result = await detector.run(query)
except ProviderNotAvailableError:
print("model provider unavailable")
except CatchCapError as e:
print(f"detection error: {e}")
catch-cap uses graceful degradation — if one detection method fails, the others continue. rate limiting prevents api overages. automatic retries handle transient errors with exponential backoff.