Skip to main content

Observability

Sprintsail's observability stack is built on open standards: OpenTelemetry for instrumentation, Prometheus for metrics, Loki for logs, and Grafana for visualization.

Stack overview

Your App  →  OpenTelemetry Collector  →  Prometheus (metrics)
→ Loki (logs)
→ Tempo (traces)

Grafana
(dashboards, alerts)

All components run within the management cluster. Observability data is namespace-scoped --- each organization sees only its own metrics, logs, and traces.

Prometheus

Prometheus scrapes and stores time-series metrics. Sprintsail collects platform metrics automatically and supports custom application metrics.

Platform metrics

Collected for every app without any configuration:

MetricTypeDescription
http_requests_totalCounterTotal requests by method, path, status code
http_request_duration_secondsHistogramRequest latency (p50, p95, p99)
container_cpu_usage_seconds_totalCounterCumulative CPU time consumed
container_memory_working_set_bytesGaugeCurrent memory usage
container_network_receive_bytes_totalCounterInbound network bytes
container_network_transmit_bytes_totalCounterOutbound network bytes
container_restarts_totalCounterContainer restart count
kube_pod_status_phaseGaugePod lifecycle phase

Custom metrics

If your app exposes a Prometheus-compatible /metrics endpoint, Sprintsail scrapes it automatically. Enable scraping:

ss app update my-app --metrics-path /metrics --metrics-port 9090

Example using the Prometheus client library (Node.js):

const client = require('prom-client');

// Collect default Node.js metrics
client.collectDefaultMetrics();

// Custom counter
const ordersProcessed = new client.Counter({
name: 'orders_processed_total',
help: 'Total orders processed',
labelNames: ['status'],
});

app.get('/metrics', async (req, res) => {
res.set('Content-Type', client.register.contentType);
res.end(await client.register.metrics());
});

Custom metrics are available in Grafana alongside platform metrics.

Retention

PlanMetric retentionResolution
Starter7 days15s scrape interval
Growth90 days15s scrape interval
Enterprise1 year15s scrape interval, long-term downsampled

Loki

Loki aggregates and indexes logs from all app instances. Logs are collected from container stdout/stderr automatically.

Log collection

No agent installation needed. Loki reads container logs via Promtail, which runs as a DaemonSet on cluster nodes. All output to stdout and stderr is captured.

Structured logging

Loki indexes JSON-formatted log lines for efficient querying. If your app outputs structured logs:

{"level":"error","msg":"database connection failed","host":"db.example.com","retry":3,"duration_ms":1200}

You can query by any field in Grafana:

{app="my-app"} | json | level="error" | duration_ms > 1000

LogQL queries

Loki uses LogQL for queries. Common patterns:

# All logs for an app
{app="my-app"}

# Error logs only
{app="my-app"} |= "error"

# JSON-parsed with field filter
{app="my-app"} | json | status >= 500

# Rate of errors over time
rate({app="my-app"} |= "error" [5m])

Retention

PlanLog retention
Starter24 hours
Growth30 days
Enterprise1 year

Tempo (distributed tracing)

Grafana Tempo stores distributed traces collected via OpenTelemetry. Tracing is available on Growth and Enterprise plans.

Enabling tracing

Configure your app to send traces to the OpenTelemetry Collector endpoint:

ss env set my-app OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.sprintsail-system:4317
ss env set my-app OTEL_SERVICE_NAME=my-app

Or use the platform-provided auto-instrumentation (no code changes):

ss app update my-app --tracing auto

Auto-instrumentation is available for Node.js, Python, Java, and .NET.

Viewing traces

Traces appear in Grafana under the Tempo data source. You can:

  • Search by trace ID, service name, or duration
  • View the full request waterfall across services
  • Correlate traces with logs and metrics (click from a trace span to see matching logs)

OpenTelemetry Collector

The OpenTelemetry Collector runs in the management cluster and acts as the central telemetry pipeline:

  • Receives traces, metrics, and logs from apps via OTLP (gRPC and HTTP)
  • Processes data (batching, sampling, attribute enrichment)
  • Exports to Prometheus, Loki, and Tempo

The collector is pre-configured. Apps only need to set the OTLP endpoint to start sending telemetry.

Grafana

Each organization gets a Grafana instance at:

https://grafana.{org}.sprintsail.com

Login with your Sprintsail credentials (SSO via Dex).

Pre-built dashboards

Every organization starts with these dashboards:

DashboardContents
App OverviewRequest rate, error rate, latency percentiles, instance count
Resource UsageCPU and memory per app and instance, trending
ServicesDatabase connections, query latency, cache hit rate
DeploymentsDeploy timeline, build duration, rollback events
Logs ExplorerFull-text log search with filters

Custom dashboards

Create custom dashboards using any combination of Prometheus, Loki, and Tempo data sources. Dashboards are saved per-organization and are accessible to all org members.

Alerting via Grafana

In addition to CLI-based alerts (ss alerts create), you can create alert rules directly in Grafana with full PromQL/LogQL conditions, notification channels, and silence windows.

Next steps

  • Monitoring guide --- practical log streaming and alerting setup
  • Architecture --- where the observability stack runs
  • Scaling --- autoscaling uses the same metrics