· Tim Quinteiro

Spanly architecture

The architecture of Spanly: mindset and reasoning behind our decisions, and the stack we ended up with.

  • engineering
  • architecture
  • mcp

What is Spanly?

Spanly tells you what’s happening inside your MCP server: which clients are connecting, which tools are being called, where errors happen, and how long things take. Drop in our SDK or wrap your server with our CLI, and the dashboard does the rest.

MCP observability is new enough that there isn’t a canonical architecture to point at. Most of what we ended up with came from trial and error.

Mindset and reasoning

Our initial goal is to capture the topology of our customers’ MCP servers. Things like:

  • servers’ name and version
  • clients’ name and version
  • tool, resource, or prompt calls
  • sessions
  • notifications
  • all metadata

The MCP protocol evolves fast. For instance, MCP applications were recently added to the spec (we are working on supporting them). The common ground is that it’s a JSON-RPC based protocol.

So our intuition is that the best way to capture the topology of an MCP server is to capture the JSON-RPC packets that flow through it. And then process them to build the topology.

Architecture

Ingestion

First, customers install the @spanly/sdk in their MCP server. They can also use the @spanly/spanly CLI to wrap their MCP server, which is useful for servers that are not in their control, such as third-party services or K8s sidecar containers.

From here, all MCP traffic is captured and sent to our ingest service. Thanks to prefixed API keys, we can route the traffic to the correct region. Routing is handled by the SDK/CLI. An API key looks like spanly_us_... or spanly_eu_....

The ingest service writes the raw packet to Cloudflare R2 (immutable, every event, no exceptions), then looks up a Redis key of the form event:{monitorId}:{mcpRequestId}. If the matching half hasn’t arrived yet, the current event is stashed there. If it has, the two are merged and one row lands in ClickHouse.

A sweeper walks the event:* keys every few minutes and flushes anything older than five minutes as an “incomplete” MCP Request, so a dropped response doesn’t leak into next week.

This part will likely be revisited as we scale. Redis might have trouble keeping up with the number of events. It may be a better bet to store each event in ClickHouse and reconstruct the request/response pairs on the fly.

Dashboard

The dashboard is a static Vite build using React and ShadCN, served from Render. It talks to the global API in Ohio and the regional API in the customer’s region. Next.js was ruled out because server-side rendering was not needed and we wanted to keep the stack simple.

Auth is handled by better-auth, and PostHog is used for analytics.

When viewing an environment, the dashboard reads the environment ID from the URL and uses it to decide which regional API to call. An environment ID looks like us_8f2a... or eu_d31c....

Account-level things (sign-in, billing, the project list) always go to the global API in Ohio. Telemetry queries go to the regional API that matches the environment prefix.

Usage based billing

A cron job runs every hour and reports usage to Stripe. Usage is tracked by the number of requests. Since requests live in regions, we need to fetch the usage from each region and sum it up. There are also some subtleties to make sure requests are not double counted.

Alerting

Another cron job runs every minute and evaluates alerts. Alerts are configured by the user and are based on the telemetry data. For instance, an alert can be triggered if the error rate is greater than 10% over the last 5 minutes. It’s very standard alerting logic.

Spanly MCP

We eat our own dogfood. Spanly MCP helps your agent debug your MCP server. Of course, we use Spanly to monitor our own MCP server.

Live demo

Having a live demo is great marketing, but it’s also great for debugging. We host a private demo MCP server that is monitored by Spanly. An MCP client then queries this demo server to produce telemetry that is displayed in the live demo dashboard.

The need for a live demo is what drove the implementation of the “public dashboard” feature. The live demo is just a public instance of the dashboard. This feature is accessible to paid plans. No special case, we kinda eat our own dog food again.

Engineering

Keep it simple, stupid. Ship fast. Watch out for agents introducing unnecessary complexity.

A single Nx monorepo so that agents have full visibility of the stack. This also helps with the development experience. We can run the entire stack locally and test (unit and e2e) everything in one go.

TypeScript everywhere. Each commit forces a lint, typecheck and full test run.

A very limited AGENT.md maintained by hand. It grows organically: every time a coding agent makes a recurring mistake, a line is added to correct it. Non-exhaustive list:

- Only comment if code is not self-documenting
- Make use of ShadCN components whenever possible, or add them using `npx shadcn@latest add <component>`
- Refer to `chart-colors.ts` for color definitions and usage

Misc tools: Linear, Slack, Cursor, Claude Code and Codex, Google workspace.

The next section mentions all the external services we use. If an MCP exists for this service, we use it.

What we use

Render runs everything, except ClickHouse. The stack is deployed via a blueprint. Each push to main triggers a deployment.

PostgreSQL holds anything transactional and bounded. Users, organisations, projects, API keys, Stripe customer IDs. We use Prisma for the schema and the typed client.

ClickHouse holds telemetry. Every Request and every Notification is a row, with serverName, serverVersion, clientName, clientVersion stored as LowCardinality(String) columns directly on the table, and of course more data.

Redis is the reconciliation buffer and cache.

Cloudflare R2 stores raw event payloads.

Stripe is billing. A cron rolls usage up every hour and reports it to Stripe.

Sentry is used for error tracking.

Resend sends transactional email. Welcome messages on sign-up. Alert notifications when a customer’s MCP server starts misbehaving.

better-auth handles authentication.

BetterStack is used for status and health monitoring.

The things we got wrong

A few mistakes worth naming, because they’re the kind of thing you don’t find in a postmortem otherwise.

We started by calling the leaf entity a “trace”. That was wrong. A trace, in the OpenTelemetry sense, is a tree of spans. An MCP request/response pair is one node. This came with a mindset shift: drop the general telemetry angle and focus more on MCP observability.

Server/Client name and version can be any string. The initial implementation stored a hash in ClickHouse and a mapping table in PostgreSQL. However, this complex setup had tricky edge cases. We later revisited it to use LowCardinality(String) columns directly in the requests table in ClickHouse.

We started on Clerk for auth. It got us moving fast, but seat-based pricing and limited control over the org/membership model didn’t fit where we were heading. We migrated to better-auth, which gave us the flexibility we needed and a cleaner ownership story over our own user data.

What’s next

Faster ingestion at scale. Deeper search.

We don’t just want to monitor — we want to help you refine your MCP server so it works better for AIs. Spanly wants to be the companion of your MCP server.

We’re working on analyzing a sample of packets to find inconsistencies in MCP server implementations, then come back to you with concrete improvements.

Try it

Tim