Loading…
Loading…
AI agent deployments come in two main flavors: cloud-hosted (the vendor runs the infrastructure) and self-hosted (you run it in your own environment). The choice has significant implications for data control, cost structure, security posture, and operational complexity. Most companies should start cloud-hosted and move portions self-hosted only when specific requirements demand it.
Written by Max Zeshut
Founder at Agentmelt
Cloud-hosted AI agent platforms (LangSmith, OpenAI Assistants, Anthropic Claude, vertical SaaS like Intercom Fin) handle infrastructure, scaling, model updates, and observability for you. Time to deploy: hours to days. Operational overhead: minimal. Cost: predictable monthly fees, often per-seat or per-task. This is the right starting point for 90% of teams—lower risk, faster ROI, smaller team burden.
Self-hosting (whether on your own servers or your private cloud) becomes essential when data sovereignty, regulatory compliance, or proprietary model access requires it. Industries that often self-host: defense and intelligence (classified data), healthcare (PHI residency requirements in some jurisdictions), financial services (regional regulators requiring data localization), and any business handling truly sensitive customer data where third-party processing is contractually disallowed.
Cloud-hosted is typically cheaper at low volume—you don't pay for infrastructure you don't use. At very high volume (millions of tasks per month), self-hosting can be cheaper due to amortized infrastructure costs. The crossover point depends on use case but typically lands around $50K-$200K/year of AI API spend. Below that, cloud wins; above that, self-hosted becomes worth evaluating. Hidden costs in self-hosted: dedicated ML/ops engineering, model update management, security patching, scaling work.
Cloud platforms now offer enterprise security: SOC 2, ISO 27001, HIPAA-eligible deployments, data residency options, no-training guarantees, and customer-managed encryption keys. For many regulated industries, modern cloud AI is acceptable. Self-hosting is still required when: contracts explicitly prohibit third-party AI processing, air-gapped environments are mandated, or regulators require specific control attestations cloud vendors can't provide.
A common emerging pattern: cloud-hosted for general workflows (most agents), self-hosted for the small subset handling regulated data. This keeps operational overhead manageable while addressing the specific compliance constraints. Tools like Bedrock, Azure OpenAI, and Vertex AI offer 'cloud with data residency' middle-ground options that satisfy many compliance requirements without full self-hosting overhead.
Yes. Llama, Mistral, Qwen, and DeepSeek models are production-capable for many agent tasks. Self-hosted deployments often use these models on infrastructure like vLLM, TGI, or Ollama for inference. Trade-off: open-source models lag the best commercial models by 6-18 months on hard tasks. For many production agent workloads (categorization, drafting, summarization), open-source is more than sufficient. For frontier reasoning tasks, commercial models still lead.
Minimum viable: 1-2 ML engineers familiar with inference infrastructure (vLLM, TGI), 1 SRE familiar with GPU operations, plus security and compliance support. Most companies that self-host successfully have a dedicated AI infrastructure team of 4-8 people. If you can't staff that, cloud-hosted is the right answer regardless of preference.