airoweb post

Treat MCP servers like production APIs

A practical operating model for teams moving MCP servers from local experiments into governed production infrastructure.

Audience: Platform teams, Engineering leaders, Security reviewers
Level: intermediate
Risk: medium
Updated: July 2, 2026

The first production MCP review should look less like a plugin demo and more like an API readiness review.

Production question	What the reviewer needs to see
Owner	Team responsible for code, scopes, incidents, and deprecation
Contract	Named resources, tools, prompts, inputs, outputs, and error behavior
Access	User identity, service identity, scopes, and downstream authorization
Runtime	Deployment boundary, network egress, rate limits, logs, and alerts
Change process	Versioning, approval path, test evidence, and rollback
Retirement	Conditions for disabling, replacing, or narrowing the server

That table is not bureaucracy. It is the difference between a useful AI integration and an unowned production dependency.

The Model Context Protocol defines a standard way for hosts, clients, and servers to exchange context and capabilities. Servers can expose resources, prompts, and tools; clients can expose features such as roots, sampling, and elicitation; the base protocol includes utilities such as logging, cancellation, progress, and error reporting Model Context Protocol specification.

Those are infrastructure primitives. Once a server can read private data, call downstream systems, or run on behalf of more than one person, it should be managed like a production API. The model may be the visible part of the workflow, but the MCP server is where data access, tool execution, and operational accountability become concrete.

The server is a product surface

A production MCP server is not just a bundle of tools. It is a product surface consumed by AI applications.

That means it needs a contract. The contract should name what the server exposes, what each capability is for, what input schema is accepted, what output shape is returned, what errors mean, and which operations are read-only, draft-producing, or state-changing. A tool called update_account is not reviewable. A tool called draft_account_status_change with a typed account ID, proposed status, reason, affected systems, and reviewer requirement is reviewable.

The same applies to resources and prompts. A resource that exposes “customer data” is too broad. A resource that exposes a redacted support-case summary for the current user’s assigned accounts is a different surface. A prompt that embeds workflow policy should be versioned, reviewed, and tested like other production behavior.

This is for platform teams and engineering leaders who are moving MCP from local developer setup into shared infrastructure. It is also for security reviewers who need something better than “the agent uses MCP” as the object under review.

It is not the right starting point for a personal local assistant that reads only a developer’s sandbox files and cannot touch company systems. It is also not enough for regulated or safety-critical workflows. In those environments, the MCP server should sit inside the existing governance path for privacy, legal, security, domain review, and audit.

Ship the narrow contract first

Start with one workflow and one bounded server contract.

The tempting design is a broad internal server that exposes everything a helpful assistant might need: documents, tickets, pull requests, deployments, CRM records, messages, billing, and browser automation. That shape is convenient for a demo and difficult to govern. It also creates a quiet failure mode: every new tool changes what the assistant might do, even if no one changed the prompt.

The first production server should be narrow enough that a reviewer can understand it in one sitting. For example:

Server contract	Avoid
Read merged pull requests and draft release notes	Full repository write access
Search approved policy documents and return citations	Raw document-store access with no classification
Create draft support replies for assigned tickets	Send customer email directly
Prepare an infrastructure change plan	Execute deployment commands

Narrow does not mean toy. A narrow server can still save meaningful time if it sits at a painful boundary: pulling evidence from the right system, formatting it consistently, or preparing a proposed change for human review. The constraint is that the server should do one job with a named owner and a known blast radius.

When the team wants to add a capability, treat that as an API change. Ask whether the new resource or tool expands data access, changes state, crosses a system boundary, or creates a new approval requirement. If the answer is yes, update the contract, tests, logs, and reviewer evidence before production rollout.

Runtime controls belong outside the prompt

A system prompt can describe policy. It should not be the only place policy is enforced.

MCP’s own security guidance covers implementation risks such as authorization mistakes, confused deputy problems, token passthrough, server-side request forgery, session hijacking, local server compromise, and scope minimization MCP security best practices. These are software and infrastructure risks. They need software and infrastructure controls.

At minimum, production servers need:

authentication and authorization tied to the requesting user or service
downstream tokens issued for the right audience, not blindly passed through
scope minimization for each workflow
egress restrictions for remote calls and OAuth discovery
input validation and output validation
durable audit logs for tool calls and reviewer approvals
rate limits, timeouts, cancellation, and failure behavior
alerts for unusual access patterns, tool errors, and policy denials

The NIST service-mesh guidance is not about MCP, but its infrastructure lesson transfers well: distributed service interactions need security, resiliency, throttling, continuous monitoring, and consistent control planes NIST SP 800-204A. MCP servers that mediate AI access to internal systems deserve the same operational posture.

For a small team, that does not require a large platform. It can start with a reverse proxy, explicit allowlists, structured logs, typed schemas, deployment ownership, and a weekly review of denied or failed tool calls. The point is to move critical controls out of model instructions and into the layer that actually handles requests.

What reviewers should reject

Reject production use when the server’s behavior cannot be inspected.

Common rejection signals:

Signal	Why it matters
No clear owner	Incidents and stale scopes will have no accountable team
Broad downstream credentials	The agent inherits more reach than the workflow needs
Dynamic tool discovery with no approval	The available action set can change under reviewers
Tool descriptions treated as trusted policy	A compromised or careless descriptor can steer behavior
No argument logging	Investigators cannot reconstruct what the assistant attempted
No human evidence for state changes	Approvers see intent but not the exact operation
Local install commands hidden or truncated	Users cannot understand what code will run on their machines
No disable path	A bad server remains connected while teams debate ownership

The NSA’s MCP security design report recommends supported projects where possible, intentional trust boundaries between components, caution around dynamic tool discovery, and filtering or monitoring of output pipelines and chained execution NSA MCP security design considerations. That is the right tone for review: assume every boundary needs to be named, then decide which controls are proportionate.

OWASP’s LLM risk categories add a second lens. Prompt injection, insecure output handling, sensitive information disclosure, insecure plugin design, and excessive agency all show up quickly when an assistant can read context and call tools OWASP Top 10 for LLM Applications. An MCP server does not create every one of those risks, but it can make them operationally meaningful because it connects model output to real systems.

Sometimes the simpler answer wins

Do not deploy MCP just because a workflow touches a system.

If the assistant only needs a static snapshot, export the data. If the task is deterministic, write a conventional service or queue worker. If the workflow is sensitive, let the AI draft and keep execution inside the existing approval system. If the team cannot name the owner, data class, approval mode, and disable path, keep the work in experiment status.

MCP is strongest when the assistant needs fresh governed context or a small set of explicit capabilities from a live system. It is weaker when the team is using it as a shortcut around API design, authorization design, or product decisions about what the assistant should actually do.

Cost should be part of the decision. A production MCP server adds maintenance, dependency review, monitoring, incident response, documentation, and user support. It also creates a new surface for access review. Those costs are justified when the server unlocks a repeated workflow that is hard to support with exports, scripts, or a normal internal tool. They are not justified by novelty.

Review again when the boundary moves

The first approval is not permanent.

Re-review the server when a new tool is added, an OAuth scope changes, a downstream system changes, a server moves from local to remote deployment, a new host application connects, a prompt starts relying on a new resource, or a workflow moves from draft suggestions to durable changes.

Also re-review when usage proves the original assumption wrong. If users keep asking the server for adjacent work, that may be evidence for a new contract. If denied tool calls are common, the server may be too narrow or the workflow may be poorly explained. If reviewers approve every action without reading the evidence, the approval step is not functioning.

The operating model is simple: one server, one owner, one contract, one runtime boundary, one review path, and one way to turn it off. Add complexity only when real usage earns it.

An MCP server is allowed to be small. In production, small is often the point.

Sources

Model Context Protocol specification, Model Context Protocol
Security Best Practices, Model Context Protocol
Model Context Protocol: Security Design Considerations, NSA
OWASP Top 10 for Large Language Model Applications, OWASP
NIST SP 800-204A: Building Secure Microservices-based Applications Using Service-Mesh Architecture, NIST