detra

llm observability framework. trace, evaluate, secure, alert.

pypi github

// overview

detra provides end-to-end llm observability for vertical ai applications with datadog integration. it automatically traces llm function calls, evaluates outputs against defined behaviors, detects security issues, and alerts on problems.

automatic tracing — decorator-based, captures inputs, outputs, and metadata without intrusive code changes
behavior evaluation — uses gemini to assess whether outputs adhere to expected behaviors defined in config
security scanning — pii detection (emails, phones, ssns, credit cards), prompt injection detection
multi-channel alerts — slack, pagerduty, custom webhooks
datadog integration — metrics, events, monitors, dashboards, incident management

// installation

pip install detra              # core
pip install detra[server]      # + fastapi/uvicorn
pip install detra[all]         # everything

environment variables:

DD_API_KEY=your-datadog-api-key
DD_APP_KEY=your-datadog-app-key
DD_SITE=datadoghq.com
GOOGLE_API_KEY=your-gemini-key

# optional
SLACK_WEBHOOK_URL=...
PAGERDUTY_INTEGRATION_KEY=...

// quick start

import detra

# initialize with yaml config
vg = detra.init("detra.yaml")

# trace any function with a decorator
@vg.trace("extract_entities")
async def extract_entities(document: str):
    result = await llm.complete(prompt)
    return result

# function call triggers tracing + evaluation + security scan
result = await extract_entities("Contract text...")

# set up datadog monitoring in one call
setup = await vg.setup_all(slack_channel="#llm-alerts")
print(f"dashboard: {setup['dashboard']['url']}")

// decorators

five decorator types for different tracing contexts:

@vg.trace("name")      # generic (default: workflow)
@vg.workflow("name")   # workflow tracing
@vg.llm("name")       # llm call tracing
@vg.task("name")      # task tracing
@vg.agent("name")     # agent tracing

decorator options

@vg.trace(
    "node_name",
    capture_input=True,
    capture_output=True,
    evaluate=True,
    input_extractor=custom_fn,
    output_extractor=custom_fn
)

module-level decorators

import detra
vg = detra.init("detra.yaml")

@detra.trace("summarize")
async def summarize(text: str):
    return await llm.complete(prompt)

// yaml config

app_name: my-llm-app
version: "1.0.0"
environment: production

datadog:
  api_key: ${DD_API_KEY}
  app_key: ${DD_APP_KEY}
  site: ${DD_SITE}
  service: my-service

gemini:
  api_key: ${GOOGLE_API_KEY}
  model: gemini-2.5-flash
  temperature: 0.1

nodes:
  extract_entities:
    description: "extract entities from documents"
    expected_behaviors:
      - "must return valid json"
      - "must extract party names accurately"
    unexpected_behaviors:
      - "hallucinated party names"
      - "fabricated dates"
    adherence_threshold: 0.85
    latency_warning_ms: 2000
    latency_critical_ms: 5000
    security_checks:
      - pii_detection
      - prompt_injection

security:
  pii_detection_enabled: true
  pii_patterns:
    - email
    - phone
    - ssn
    - credit_card
  prompt_injection_detection: true

integrations:
  slack:
    enabled: true
    webhook_url: ${SLACK_WEBHOOK_URL}
    channel: "#llm-alerts"
    notify_on:
      - flag_raised
      - incident_created
      - security_issue

create_dashboard: true

// evaluation

the evaluation pipeline runs in order: rule-based checks, security scans, llm evaluation via gemini, flagging, then alerting.

manual evaluation

eval_result = await vg.evaluate(
    node_name="extract_entities",
    input_data="document text",
    output_data={"entities": [...]}
)

print(f"score: {eval_result.score}")
print(f"flagged: {eval_result.flagged}")
print(f"failed: {eval_result.checks_failed}")

result object

score float — adherence score, 0.0-1.0

flagged bool — whether the output was flagged

flag_category str — type of flag (hallucination, format_error, etc.)

flag_reason str — reason for flagging

checks_passed list[str] — passed behavior checks

checks_failed list[str] — failed behavior checks

security_issues list[str] — detected security issues

latency_ms float — evaluation latency

// security

automatic detection built into every traced call:

pii detection — emails, phone numbers, ssns, credit cards
prompt injection — detects injection attempts in inputs and outputs
sensitive content — flags medical records, financial details

configure per-node in yaml under security_checks. set block_on_detection: true to block flagged outputs.