MCP observability vs APM: what still falls through the gap

If you already run Datadog, Sentry, or New Relic, you might reasonably ask why your MCP server needs anything else. You instrument everything else with your APM. Why not this?

The short answer: out of the box, your APM sees HTTP and infrastructure, not the MCP protocol. The vendors know it, and some now ship MCP instrumentation in their SDKs. That support is real and worth understanding. This post is about what it covers, what still falls through the gap, and why the answer is “both,” not “either.”

What your APM sees

A general-purpose APM is excellent at what it was built for. It sees:

HTTP requests and responses, status codes, and route-level latency.
Infrastructure: CPU, memory, container health, database queries.
Stack traces when your process throws.
Distributed traces across your services, stitched by trace context.

For an MCP server, that means your APM can tell you the POST /mcp endpoint returned 200 in 180ms. Without protocol-aware instrumentation, it cannot tell you what happened inside.

What it misses

An MCP exchange is a JSON-RPC message: a tool call, a prompt fetch, a resource read. One HTTP request can carry a tool call that failed at the protocol level while the HTTP layer reports a clean 200. That is the core of the problem. Here is what your APM cannot answer from the HTTP layer alone:

Which tool was called, with what arguments. To the APM it is an opaque request body. To you, the tool name and arguments are the whole story.

Whether the tool call actually succeeded. MCP errors live inside the JSON-RPC response. A tool can return an error object inside a 200 OK. Your APM counts that as a success. Your customer’s agent counts it as a failure.

Which client made the call. Claude Desktop, Cursor, Codex, Windsurf, or some agent you have never seen. Client identity is the dimension that explains most regressions, and it is buried in a header your APM does not break out.

Session continuity. MCP work happens across a session. APM traces are per-request. Reconstructing “what was this agent trying to do” from per-request HTTP spans is painful.

Per-tool performance. Your APM gives you route-level latency. But POST /mcp is one route carrying twenty different tools with wildly different performance profiles. The average is a lie.

A worked example

A customer reports that “the agent keeps failing.” You open your APM. The /mcp endpoint shows a 99.8% success rate and a healthy p95. Nothing looks wrong. You close the ticket as “cannot reproduce.”

What actually happened: one tool, search_orders, returns a JSON-RPC error for any query containing a date range, because of a parsing bug. That is a 200 OK at the HTTP layer every single time. The error is in the response body. To your APM it is invisible. To the agent calling it, every date-range search fails.

With protocol-level monitoring, this is a thirty-second find: filter to search_orders, sort by error, see that every failure carries a date-range argument. Same data, completely different debugging experience, because the unit of observation is the tool call, not the HTTP request.

Why not just add custom spans?

You can. You can manually instrument your MCP handlers with custom spans and attributes in your APM. Teams do it. Two things tend to happen.

First, it is a lot of bespoke work, and it drifts. Every new tool needs new instrumentation, and the moment someone forgets, you have a blind spot exactly where you will eventually need to look.

Second, the MCP model does not map cleanly onto a span tree. An MCP request/response is one JSON-RPC exchange: one node, not a tree of spans. When you force it into a span hierarchy, the mismatch leaks into every query you write. We learned this building Spanly, and it is why the product models MCP natively instead of dressing it up as something it is not.

What about the APMs’ own MCP support?

The vendors have started closing the gap themselves, and to be fair, it works:

Sentry ships MCP server monitoring in its Node and Python SDKs. One line wraps the official MCP server, and tool calls, resource reads, and prompt fetches show up as spans, with dashboards broken down by tool, client, and transport.
New Relic added MCP support to its AI Monitoring, instrumenting the MCP invocation lifecycle with waterfall views. Python agent only, for now.
Datadog traces MCP in LLM Observability, but from the client side: it instruments the MCP Python client your agent uses. If you operate the server and your clients are other people’s Claude Desktop and Cursor installs, it does not see your traffic.

If your server is a Node or Python process you own, and span-level data inside your existing tracing quota is what you need, these are good options. Their structural limits are the reason Spanly exists:

Language coverage. The instrumentation lives in the vendor’s SDK. A Go, Rust, Java, or C# MCP server gets nothing until that vendor ships an agent for it.
In-process only. You have to add their SDK to the server’s code. A third-party server, a vendored binary, or a sidecar you do not own cannot be instrumented.
Spans, not messages. You get span attributes, subject to your tracing quota and attribute limits, not the full JSON-RPC request and response.

Spanly sits at the transport layer instead. The CLI wraps any MCP server process, stdio or HTTP, in any language, with zero code changes, and captures complete packets. A proxy mode covers servers you cannot wrap at all.

The answer is both

This is not a replacement pitch. Keep your APM. It owns HTTP, infrastructure, and your wider service, and it does that well. Add a layer that understands the MCP protocol on top.

Spanly is additive by design. Continue sending HTTP and infrastructure telemetry to Datadog, Sentry, or New Relic, and send MCP-shaped traffic to Spanly. Captured packets preserve the W3C trace context verbatim, the traceparent header on HTTP transports and params._meta.traceparent on stdio, so the same exchange can be correlated across both systems.

The two run side by side. The APM answers “is the service healthy.” Spanly answers “is the protocol healthy,” and those are genuinely different questions.

Keep reading

MCP OpenTelemetry tracing vs Spanly: the same comparison against the MCP SDK’s own SEP-414 spans.
How to monitor your MCP server in production: the practical setup guide.
What is MCP observability?: the category, defined.
Live demo dashboard: see the protocol-level view on real data.

Tim