Observability

Metrics, tracing, and logging across InferaDB services.

Prometheus Metrics

Each service exposes a /metrics endpoint in Prometheus exposition format.

Engine Metrics

Authorization

Metric Type Description
inferadb_checks_total Counter Total authorization checks performed
inferadb_check_duration_seconds Histogram Authorization check latency

Cache

Metric Type Description
inferadb_cache_hits_total Counter Cache hits
inferadb_cache_misses_total Counter Cache misses

Storage

Metric Type Description
inferadb_storage_read_duration_seconds Histogram Storage read latency
inferadb_storage_write_duration_seconds Histogram Storage write latency
inferadb_replication_lag_seconds Gauge Replication lag from leader

API

Metric Type Description
inferadb_api_requests_total Counter Total API requests by method and path
inferadb_api_errors_total Counter Total API errors by status code

Auth Metrics

Metric Type Description
inferadb_auth_attempts_total Counter Authentication attempts
inferadb_auth_success_total Counter Successful authentications
inferadb_auth_failure_total Counter Failed authentications
inferadb_auth_duration_seconds Histogram Authentication processing time
inferadb_jwks_cache_hits_total Counter JWKS cache hits
inferadb_jwks_cache_misses_total Counter JWKS cache misses
inferadb_jwt_validation_errors_total Counter JWT validation errors by reason

Scrape Configuration

Prometheus scrape config for a Docker deployment:

scrape_configs:
  - job_name: inferadb-engine
    static_configs:
      - targets: ["engine:8080"]
  - job_name: inferadb-control
    static_configs:
      - targets: ["control:9090"]
  - job_name: inferadb-ledger
    static_configs:
      - targets: ["ledger:50051"]

For Kubernetes, enable the ServiceMonitor in the Helm chart:

engine:
  serviceMonitor:
    enabled: true
    interval: 15s

OpenTelemetry Tracing

Traces are exported via OTLP, spanning the full request lifecycle (API ingestion through evaluation and response).

Configuration

Standard OpenTelemetry environment variables:

Variable Default Description
OTEL_EXPORTER_OTLP_ENDPOINT OTLP collector endpoint (e.g., http://otel-collector:4317)
OTEL_SERVICE_NAME inferadb-engine Service name in traces
OTEL_TRACES_SAMPLER parentbased_traceidratio Sampling strategy
OTEL_TRACES_SAMPLER_ARG 1.0 Sampling rate (0.0 to 1.0)

Example

docker run -p 8080:8080 -p 8081:8081 \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317 \
  -e OTEL_SERVICE_NAME=inferadb-engine \
  inferadb/inferadb-engine:latest

Traces are compatible with any OTLP-capable backend — Jaeger, Tempo, Honeycomb, Datadog, etc.

Structured Logging

Log levels are controlled per-module via RUST_LOG:

# Set global level to info, with debug for the evaluator
RUST_LOG=info,inferadb_core::evaluator=debug

# Trace-level logging for auth (verbose)
RUST_LOG=info,inferadb_auth=trace

Log Format

Each log line is a JSON object:

{
  "timestamp": "2026-03-24T10:15:30.123Z",
  "level": "INFO",
  "target": "inferadb_api::handler",
  "message": "check completed",
  "vault_id": "v_abc123",
  "duration_ms": 1.8,
  "result": "ALLOW",
  "span_id": "a1b2c3d4e5f6"
}

Audit Logging

Security events are logged and persisted to the Ledger:

Event Description
AuthenticationSuccess Successful token validation
AuthenticationFailure Failed authentication attempt
ScopeViolation Request exceeded the token’s granted scopes
TenantIsolationViolation Attempt to access data outside the token’s vault

These events are always logged at WARN or ERROR level regardless of the configured log level.

Grafana Dashboards

Pre-built Grafana dashboards:

  • Engine Overview — Check rate, latency percentiles, cache hit ratio, error rate
  • Ledger Health — Raft leader status, write latency, replication lag, snapshot status
  • Authentication — Auth success/failure rate, JWKS cache performance, JWT error breakdown
  • Tenant Activity — Per-vault check volume, write rate, and cache efficiency

Available as JSON files in the repository. Import directly or use Grafana dashboard provisioning.