Observability

Name: InferaDB
Author: InferaDB

Metrics, tracing, and logging across InferaDB services.

Prometheus Metrics

Each service exposes a /metrics endpoint in Prometheus exposition format.

Engine Metrics

Authorization

Metric	Type	Description
`inferadb_checks_total`	Counter	Total authorization checks performed
`inferadb_check_duration_seconds`	Histogram	Authorization check latency

Cache

Metric	Type	Description
`inferadb_cache_hits_total`	Counter	Cache hits
`inferadb_cache_misses_total`	Counter	Cache misses

Storage

Metric	Type	Description
`inferadb_storage_read_duration_seconds`	Histogram	Storage read latency
`inferadb_storage_write_duration_seconds`	Histogram	Storage write latency
`inferadb_replication_lag_seconds`	Gauge	Replication lag from leader

API

Metric	Type	Description
`inferadb_api_requests_total`	Counter	Total API requests by method and path
`inferadb_api_errors_total`	Counter	Total API errors by status code

Auth Metrics

Metric	Type	Description
`inferadb_auth_attempts_total`	Counter	Authentication attempts
`inferadb_auth_success_total`	Counter	Successful authentications
`inferadb_auth_failure_total`	Counter	Failed authentications
`inferadb_auth_duration_seconds`	Histogram	Authentication processing time
`inferadb_jwks_cache_hits_total`	Counter	JWKS cache hits
`inferadb_jwks_cache_misses_total`	Counter	JWKS cache misses
`inferadb_jwt_validation_errors_total`	Counter	JWT validation errors by reason

Scrape Configuration

Prometheus scrape config for a Docker deployment:

scrape_configs:
  - job_name: inferadb-engine
    static_configs:
      - targets: ["engine:8080"]
  - job_name: inferadb-control
    static_configs:
      - targets: ["control:9090"]
  - job_name: inferadb-ledger
    static_configs:
      - targets: ["ledger:50051"]

For Kubernetes, enable the ServiceMonitor in the Helm chart:

engine:
  serviceMonitor:
    enabled: true
    interval: 15s

OpenTelemetry Tracing

Traces are exported via OTLP, spanning the full request lifecycle (API ingestion through evaluation and response).

Configuration

Standard OpenTelemetry environment variables:

Variable	Default	Description
`OTEL_EXPORTER_OTLP_ENDPOINT`	—	OTLP collector endpoint (e.g., `http://otel-collector:4317`)
`OTEL_SERVICE_NAME`	`inferadb-engine`	Service name in traces
`OTEL_TRACES_SAMPLER`	`parentbased_traceidratio`	Sampling strategy
`OTEL_TRACES_SAMPLER_ARG`	`1.0`	Sampling rate (0.0 to 1.0)

Example

docker run -p 8080:8080 -p 8081:8081 \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317 \
  -e OTEL_SERVICE_NAME=inferadb-engine \
  inferadb/inferadb-engine:latest

Traces are compatible with any OTLP-capable backend — Jaeger, Tempo, Honeycomb, Datadog, etc.

Structured Logging

Log levels are controlled per-module via RUST_LOG:

# Set global level to info, with debug for the evaluator
RUST_LOG=info,inferadb_core::evaluator=debug

# Trace-level logging for auth (verbose)
RUST_LOG=info,inferadb_auth=trace

Log Format

Each log line is a JSON object:

{
  "timestamp": "2026-03-24T10:15:30.123Z",
  "level": "INFO",
  "target": "inferadb_api::handler",
  "message": "check completed",
  "vault_id": "v_abc123",
  "duration_ms": 1.8,
  "result": "ALLOW",
  "span_id": "a1b2c3d4e5f6"
}

Audit Logging

Security events are logged and persisted to the Ledger:

Event	Description
`AuthenticationSuccess`	Successful token validation
`AuthenticationFailure`	Failed authentication attempt
`ScopeViolation`	Request exceeded the token’s granted scopes
`TenantIsolationViolation`	Attempt to access data outside the token’s vault

These events are always logged at WARN or ERROR level regardless of the configured log level.

Grafana Dashboards

Pre-built Grafana dashboards:

Engine Overview — Check rate, latency percentiles, cache hit ratio, error rate
Ledger Health — Raft leader status, write latency, replication lag, snapshot status
Authentication — Auth success/failure rate, JWKS cache performance, JWT error breakdown
Tenant Activity — Per-vault check volume, write rate, and cache efficiency

Available as JSON files in the repository. Import directly or use Grafana dashboard provisioning.