Production deploy
Helm, Kustomize, SSE pass-through, admin endpoints, and OTLP co-export.
This page collects the patterns we recommend for running Spanly in production environments.
Deployment shapes
| Shape | When to use |
|---|---|
spanly run as a child wrapper | The MCP server is yours and starts as part of your container/process. Most common. |
spanly proxy as a sidecar | The MCP server is its own process or container, and you sit Spanly next to it. |
spanly proxy as a standalone | Front a third-party MCP service from your own ingress. |
| Docker container | Compose-style deployments or simple Kubernetes Pods. See Docker. |
| Helm chart | Standalone Pod + Service in front of an internal MCP. |
| Kustomize component | Co-locate Spanly as a sidecar in your existing Pod. |
Helm
A maintained chart is at
charts/spanly.
helm repo add spanly https://charts.spanly.com
helm install spanly spanly/spanly \
--set apiKey=$SPANLY_API_KEY \
--set upstream=mcp.default.svc.cluster.local:3000The chart creates a Pod running spanly proxy, a Service in front of
it, optional Ingress, and a ConfigMap for the flag set.
Kustomize sidecar
A maintained component is at
kustomize/spanly-sidecar.
# kustomization.yaml
components:
- https://github.com/spanlyhq/spanly/tree/main/kustomize/spanly-sidecarIt injects the Spanly container into the existing Pod, exposes a new
port (default 3001), and reads the API key from a Secret.
Putting Spanly behind nginx / Caddy / Envoy
If you front Spanly with another reverse proxy, SSE responses can stall in the front proxy's response buffer. Disable buffering on the relevant routes.
nginx
location /mcp {
proxy_pass http://spanly:3001;
proxy_buffering off;
proxy_cache off;
proxy_set_header X-Accel-Buffering no;
proxy_read_timeout 1h;
}Caddy
reverse_proxy spanly:3001 {
flush_interval -1
}Envoy
- Disable response buffering on the relevant route.
- Set
auto_host_rewrite: trueif Spanly is selected by name.
Admin endpoints
For health checks and Prometheus scraping, enable the admin listener:
spanly proxy --admin-addr=:9090 mcp:3000 0.0.0.0:3001GET /healthz– 200 if the listener is up.GET /readyz– 200 if the upstream is reachable (1s cache).GET /metrics– Prometheus text format.
Wire /readyz into your orchestrator's readiness probe to avoid sending
traffic to a Spanly proxy whose upstream is down.
OTLP co-export
Spanly can ship the same telemetry to any OTLP/HTTP backend (Datadog,
Honeycomb, Grafana Tempo, an OTel Collector, …) in parallel with shipping
to Spanly. Spans follow the official
MCP semantic conventions:
each JSON-RPC request/response pair becomes one span named
{method} {target} (e.g. tools/call get-weather).
Trace context propagation:
- HTTP mode – W3C
traceparentheaders on incoming requests are extracted automatically, so MCP spans become children of upstream traces. - stdio mode –
traceparentis read from JSON-RPCparams._metaper the MCP semconv. If absent, a new root trace is started.
Enable it:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
spanly proxy mcp:3000 0.0.0.0:3001Only http/protobuf is supported. For gRPC backends, run an OTel
Collector hop.
Capacity & sizing
The CLI is single-binary and stateless. A typical sidecar:
- ~15 MB resident memory at idle.
- ~25–40 MB at sustained 1k packets/s.
- Single-core CPU bound on TLS at high throughput.
For load above ~5k packets/s on a single instance, consider sharding clients across multiple Spanly proxies (DNS round-robin or service mesh).
What's not yet supported
- Windows – best-effort, not regression-tested. Use WSL.
- WebSockets – only HTTP, SSE, and stdio.
- TLS termination on the bind side – front Spanly with your own reverse proxy.