Observability
Scani is observability-friendly without being observability-heavy. Structured logs and HTTP healthchecks come out of the box; everything else is opt-in.
Structured logs
Section titled “Structured logs”Every service uses @scani/logging (pino). Logs go to stdout as
single-line JSON in production, pretty-printed in dev.
| Env var | Effect |
|---|---|
LOG_LEVEL | debug, info, warn, error. Default info. |
LOG_PRETTY | Pretty-print. Default false in production. |
LOG_SQL_QUERIES | Log every Drizzle query. Default false. Useful for short debug sessions; very chatty otherwise. |
LOG_ID_PEPPER | Required in production. 16+ chars. Pepper used to one-way hash user / tenant / account IDs before they appear in logs. Missing pepper is a hard boot failure in prod. |
SERVICE_NAME | Set automatically by compose (api, worker, data-provider). |
The pepper is what makes the logs safe to ship to a centralised log aggregator. Without it, raw UUIDs would appear in plaintext and correlation across hosts could re-identify users.
Collecting logs
Section titled “Collecting logs”Any container-aware log shipper works. Common setups:
- Loki + Promtail / Grafana Alloy — Promtail tails the Docker log driver and forwards to Loki. Grafana queries Loki.
- Datadog / New Relic / Honeycomb — install their agent on the host, point it at the Docker socket.
docker compose logs— fine for a one-box deploy.
Sample fields you’ll see:
{ "level": "info", "time": "2026-05-24T11:00:00.000Z", "service": "worker", "component": "service:PriceGraphService", "msg": "convert", "fromToken": "<hashed>", "toToken": "<hashed>", "path": "one-hop-USD", "stale": false}Healthchecks
Section titled “Healthchecks”| Service | Endpoint | Status code |
|---|---|---|
api | GET /health | 200 {"status":"ok"} when DB + Redis reachable. |
data-provider | GET /health | Same. |
frontend-app | GET /healthz | Static 200. |
worker | none | The worker has no HTTP surface. Liveness is the container being running; readiness is “processed at least one heartbeat job” (visible in BullMQ dashboards). |
The compose file wires these as Docker healthchecks already; your reverse proxy or load balancer can probe them too.
Sentry (optional)
Section titled “Sentry (optional)”Server-side:
| Variable | Effect |
|---|---|
SENTRY_DSN | If unset, the SDK is a no-op and nothing leaves the process. |
SENTRY_ENVIRONMENT | Tag for the release (production, staging). |
SENTRY_RELEASE | Release identifier; useful to correlate with deployed image tag. |
Browser-side:
| Variable | Effect |
|---|---|
VITE_SENTRY_DSN | Baked into the SPA bundle at build time. |
VITE_SENTRY_ENABLED | true to enable. |
Payloads pass through packages/business/shared/src/utils/sentry-scrubber.ts
which strips known-credential shapes (apiKey, secret, token,
session cookies, integration credentials) before send. Verify
yourself before pointing at a shared Sentry project.
Metrics (bring your own)
Section titled “Metrics (bring your own)”There is no built-in Prometheus exporter — yet. If you need metrics:
- Postgres — install
postgres_exporteror use your managed provider’s metrics surface. - Redis — install
redis_exporter. - BullMQ queue depth + DLQ depth — the
dlq-depth-probeandjob-heartbeat-probescheduled jobs emit warn-level logs when they cross thresholds. Convert to metrics via log-based extraction if your stack supports it (Loki hasmetricsquery type; Honeycomb has BubbleUp). - API request latencies — instrument your reverse proxy (Caddy and nginx both emit access logs with timing).
If you’d like a built-in Prometheus endpoint, open an issue on GitHub — it’s been discussed and there’s no strong reason not to ship one.
Tracing
Section titled “Tracing”Not built in. The codebase doesn’t carry OTLP exporters or
context propagation. The closest thing is correlation IDs in logs
(every request has a requestId, every job has a jobId).
What to alert on
Section titled “What to alert on”Production-grade alerts to consider:
- API
/healthreturning non-200 for > 1 minute → service unhealthy. - DLQ depth > 100 → jobs failing systematically.
- Postgres connection saturation → exhausted pool (often the
POSTGRES_POOL_MAXvs pooler mismatch). - Disk usage on the
postgres-dataandminio-datavolumes. - Sustained job heartbeat misses (the
job-heartbeat-probereports these). unpriceableUntilcooldowns piling up on many tokens at once → an upstream pricing provider has changed shape.