Production deploy

This page collects the patterns we recommend for running Spanly in production environments.

Deployment shapes

Shape	When to use
`spanly run` as a child wrapper	The MCP server is yours and starts as part of your container/process. Most common.
`spanly proxy` as a sidecar	The MCP server is its own process or container, and you sit Spanly next to it.
`spanly proxy` as a standalone	Front a third-party MCP service from your own ingress.
Docker container	Compose-style deployments or simple Kubernetes Pods. See Docker.
Helm chart	Standalone Pod + Service in front of an internal MCP.
Kustomize component	Co-locate Spanly as a sidecar in your existing Pod.

Helm

A maintained chart is at charts/spanly. Install it from a clone of the repo. The chart reads the API key from a Kubernetes Secret, never from a chart value:

git clone https://github.com/spanlyhq/spanly.git
kubectl create secret generic spanly --from-literal=api-key=$SPANLY_API_KEY

helm install spanly ./spanly/charts/spanly \
  --set proxy.upstream=http://mcp.default.svc.cluster.local:3000

The chart creates a Deployment running spanly proxy, a Service in front of it (port 3001 by default), and an optional Prometheus ServiceMonitor. See the chart's values reference for context headers, admin endpoints, and resource overrides.

Kustomize sidecar

A maintained component is at kustomize/spanly-sidecar.

# kustomization.yaml
resources:
  - my-deployment.yaml
components:
  - https://github.com/spanlyhq/spanly//kustomize/spanly-sidecar?ref=main

It injects the Spanly container into Deployments labelled spanly-sidecar=true, exposes a new port (default 3001), and reads the API key from a Secret.

Putting Spanly behind nginx / Caddy / Envoy

If you front Spanly with another reverse proxy, SSE responses can stall in the front proxy's response buffer. Disable buffering on the relevant routes.

nginx

location /mcp {
    proxy_pass http://spanly:3001;
    proxy_buffering off;
    proxy_cache off;
    proxy_set_header X-Accel-Buffering no;
    proxy_read_timeout 1h;
}

Caddy

reverse_proxy spanly:3001 {
    flush_interval -1
}

Envoy

Disable response buffering on the relevant route.
Set auto_host_rewrite: true if Spanly is selected by name.

Admin endpoints

For health checks and Prometheus scraping, enable the admin listener:

spanly proxy --admin-addr=:9090 mcp:3000 0.0.0.0:3001

GET /healthz: 200 if the listener is up.
GET /readyz: 200 if the upstream is reachable (1s cache).
GET /metrics: Prometheus text format.

Wire /readyz into your orchestrator's readiness probe to avoid sending traffic to a Spanly proxy whose upstream is down.

OpenTelemetry

The CLI does not export OTel spans. It ships telemetry to Spanly only. The inbound traceparent header (when present) is preserved verbatim on each captured packet. Pick your APM provider in the Spanly dashboard (Settings, Integrations) and every request with trace context links straight to the matching trace in Datadog, Sentry or New Relic. Nothing to configure on the CLI side.

Capacity & sizing

The CLI is single-binary and stateless. Indicative figures for a typical sidecar, measured on our own deployments:

Around 15 MB resident memory at idle.
25 to 40 MB at sustained 1k packets/s.
Single-core CPU bound on TLS at high throughput.

For load above a few thousand packets/s on a single instance, consider sharding clients across multiple Spanly proxies (DNS round-robin or service mesh).

What's not yet supported

Windows: best-effort, not regression-tested. Use WSL.
WebSockets: only HTTP, SSE, and stdio.
TLS termination on the bind side: front Spanly with your own reverse proxy.

Production deploy

On this page