Observability

Sprintsail's observability stack is built on open standards: OpenTelemetry for instrumentation, Prometheus for metrics, Loki for logs, and Grafana for visualization.

Stack overview

Your App  →  OpenTelemetry Collector  →  Prometheus (metrics)
                                      →  Loki (logs)
                                      →  Tempo (traces)
                                              ↓
                                          Grafana
                                       (dashboards, alerts)

All components run within the management cluster. Observability data is namespace-scoped --- each organization sees only its own metrics, logs, and traces.

Prometheus

Prometheus scrapes and stores time-series metrics. Sprintsail collects platform metrics automatically and supports custom application metrics.

Platform metrics

Collected for every app without any configuration:

Metric	Type	Description
`http_requests_total`	Counter	Total requests by method, path, status code
`http_request_duration_seconds`	Histogram	Request latency (p50, p95, p99)
`container_cpu_usage_seconds_total`	Counter	Cumulative CPU time consumed
`container_memory_working_set_bytes`	Gauge	Current memory usage
`container_network_receive_bytes_total`	Counter	Inbound network bytes
`container_network_transmit_bytes_total`	Counter	Outbound network bytes
`container_restarts_total`	Counter	Container restart count
`kube_pod_status_phase`	Gauge	Pod lifecycle phase

Custom metrics

If your app exposes a Prometheus-compatible /metrics endpoint, Sprintsail scrapes it automatically. Enable scraping:

ss app update my-app --metrics-path /metrics --metrics-port 9090

Example using the Prometheus client library (Node.js):

const client = require('prom-client');

// Collect default Node.js metrics
client.collectDefaultMetrics();

// Custom counter
const ordersProcessed = new client.Counter({
  name: 'orders_processed_total',
  help: 'Total orders processed',
  labelNames: ['status'],
});

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', client.register.contentType);
  res.end(await client.register.metrics());
});

Custom metrics are available in Grafana alongside platform metrics.

Retention

Plan	Metric retention	Resolution
Starter	7 days	15s scrape interval
Growth	90 days	15s scrape interval
Enterprise	1 year	15s scrape interval, long-term downsampled

Loki

Loki aggregates and indexes logs from all app instances. Logs are collected from container stdout/stderr automatically.

Log collection

No agent installation needed. Loki reads container logs via Promtail, which runs as a DaemonSet on cluster nodes. All output to stdout and stderr is captured.

Structured logging

Loki indexes JSON-formatted log lines for efficient querying. If your app outputs structured logs:

{"level":"error","msg":"database connection failed","host":"db.example.com","retry":3,"duration_ms":1200}

You can query by any field in Grafana:

{app="my-app"} | json | level="error" | duration_ms > 1000

LogQL queries

Loki uses LogQL for queries. Common patterns:

# All logs for an app
{app="my-app"}

# Error logs only
{app="my-app"} |= "error"

# JSON-parsed with field filter
{app="my-app"} | json | status >= 500

# Rate of errors over time
rate({app="my-app"} |= "error" [5m])

Retention

Plan	Log retention
Starter	24 hours
Growth	30 days
Enterprise	1 year

Tempo (distributed tracing)

Grafana Tempo stores distributed traces collected via OpenTelemetry. Tracing is available on Growth and Enterprise plans.

Enabling tracing

Configure your app to send traces to the OpenTelemetry Collector endpoint:

ss env set my-app OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.sprintsail-system:4317
ss env set my-app OTEL_SERVICE_NAME=my-app

Or use the platform-provided auto-instrumentation (no code changes):

ss app update my-app --tracing auto

Auto-instrumentation is available for Node.js, Python, Java, and .NET.

Viewing traces

Traces appear in Grafana under the Tempo data source. You can:

Search by trace ID, service name, or duration
View the full request waterfall across services
Correlate traces with logs and metrics (click from a trace span to see matching logs)

OpenTelemetry Collector

The OpenTelemetry Collector runs in the management cluster and acts as the central telemetry pipeline:

Receives traces, metrics, and logs from apps via OTLP (gRPC and HTTP)
Processes data (batching, sampling, attribute enrichment)
Exports to Prometheus, Loki, and Tempo

The collector is pre-configured. Apps only need to set the OTLP endpoint to start sending telemetry.

Grafana

Each organization gets a Grafana instance at:

https://grafana.{org}.sprintsail.com

Pre-built dashboards

Every organization starts with these dashboards:

Dashboard	Contents
App Overview	Request rate, error rate, latency percentiles, instance count
Resource Usage	CPU and memory per app and instance, trending
Services	Database connections, query latency, cache hit rate
Deployments	Deploy timeline, build duration, rollback events
Logs Explorer	Full-text log search with filters

Custom dashboards

Create custom dashboards using any combination of Prometheus, Loki, and Tempo data sources. Dashboards are saved per-organization and are accessible to all org members.

Alerting via Grafana

In addition to CLI-based alerts (ss alerts create), you can create alert rules directly in Grafana with full PromQL/LogQL conditions, notification channels, and silence windows.

Next steps

Monitoring guide --- practical log streaming and alerting setup
Architecture --- where the observability stack runs
Scaling --- autoscaling uses the same metrics

Stack overview​

Prometheus​

Platform metrics​

Custom metrics​

Retention​

Loki​

Log collection​

Structured logging​

LogQL queries​

Retention​

Tempo (distributed tracing)​

Enabling tracing​

Viewing traces​

OpenTelemetry Collector​

Grafana​

Pre-built dashboards​

Custom dashboards​

Alerting via Grafana​

Next steps​

Stack overview

Prometheus

Platform metrics

Custom metrics

Retention

Loki

Log collection

Structured logging

LogQL queries

Retention

Tempo (distributed tracing)

Enabling tracing

Viewing traces

OpenTelemetry Collector

Grafana

Pre-built dashboards

Custom dashboards

Alerting via Grafana

Next steps