Spanly Docs
CLI

Production deploy

Helm, Kustomize, SSE pass-through, admin endpoints, and OTLP co-export.

This page collects the patterns we recommend for running Spanly in production environments.

Deployment shapes

ShapeWhen to use
spanly run as a child wrapperThe MCP server is yours and starts as part of your container/process. Most common.
spanly proxy as a sidecarThe MCP server is its own process or container, and you sit Spanly next to it.
spanly proxy as a standaloneFront a third-party MCP service from your own ingress.
Docker containerCompose-style deployments or simple Kubernetes Pods. See Docker.
Helm chartStandalone Pod + Service in front of an internal MCP.
Kustomize componentCo-locate Spanly as a sidecar in your existing Pod.

Helm

A maintained chart is at charts/spanly.

helm repo add spanly https://charts.spanly.com
helm install spanly spanly/spanly \
  --set apiKey=$SPANLY_API_KEY \
  --set upstream=mcp.default.svc.cluster.local:3000

The chart creates a Pod running spanly proxy, a Service in front of it, optional Ingress, and a ConfigMap for the flag set.

Kustomize sidecar

A maintained component is at kustomize/spanly-sidecar.

# kustomization.yaml
components:
  - https://github.com/spanlyhq/spanly/tree/main/kustomize/spanly-sidecar

It injects the Spanly container into the existing Pod, exposes a new port (default 3001), and reads the API key from a Secret.

Putting Spanly behind nginx / Caddy / Envoy

If you front Spanly with another reverse proxy, SSE responses can stall in the front proxy's response buffer. Disable buffering on the relevant routes.

nginx

location /mcp {
    proxy_pass http://spanly:3001;
    proxy_buffering off;
    proxy_cache off;
    proxy_set_header X-Accel-Buffering no;
    proxy_read_timeout 1h;
}

Caddy

reverse_proxy spanly:3001 {
    flush_interval -1
}

Envoy

  • Disable response buffering on the relevant route.
  • Set auto_host_rewrite: true if Spanly is selected by name.

Admin endpoints

For health checks and Prometheus scraping, enable the admin listener:

spanly proxy --admin-addr=:9090 mcp:3000 0.0.0.0:3001
  • GET /healthz – 200 if the listener is up.
  • GET /readyz – 200 if the upstream is reachable (1s cache).
  • GET /metrics – Prometheus text format.

Wire /readyz into your orchestrator's readiness probe to avoid sending traffic to a Spanly proxy whose upstream is down.

OTLP co-export

Spanly can ship the same telemetry to any OTLP/HTTP backend (Datadog, Honeycomb, Grafana Tempo, an OTel Collector, …) in parallel with shipping to Spanly. Spans follow the official MCP semantic conventions: each JSON-RPC request/response pair becomes one span named {method} {target} (e.g. tools/call get-weather).

Trace context propagation:

  • HTTP mode – W3C traceparent headers on incoming requests are extracted automatically, so MCP spans become children of upstream traces.
  • stdio modetraceparent is read from JSON-RPC params._meta per the MCP semconv. If absent, a new root trace is started.

Enable it:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
spanly proxy mcp:3000 0.0.0.0:3001

Only http/protobuf is supported. For gRPC backends, run an OTel Collector hop.

Capacity & sizing

The CLI is single-binary and stateless. A typical sidecar:

  • ~15 MB resident memory at idle.
  • ~25–40 MB at sustained 1k packets/s.
  • Single-core CPU bound on TLS at high throughput.

For load above ~5k packets/s on a single instance, consider sharding clients across multiple Spanly proxies (DNS round-robin or service mesh).

What's not yet supported

  • Windows – best-effort, not regression-tested. Use WSL.
  • WebSockets – only HTTP, SSE, and stdio.
  • TLS termination on the bind side – front Spanly with your own reverse proxy.

On this page