Can we use open-source models to self-host AI agents?

Yes. Llama, Mistral, Qwen, and DeepSeek models are production-capable for many agent tasks. Self-hosted deployments often use these models on infrastructure like vLLM, TGI, or Ollama for inference. Trade-off: open-source models lag the best commercial models by 6-18 months on hard tasks. For many production agent workloads (categorization, drafting, summarization), open-source is more than sufficient. For frontier reasoning tasks, commercial models still lead.

What's the operational team size for self-hosting?

Minimum viable: 1-2 ML engineers familiar with inference infrastructure (vLLM, TGI), 1 SRE familiar with GPU operations, plus security and compliance support. Most companies that self-host successfully have a dedicated AI infrastructure team of 4-8 people. If you can't staff that, cloud-hosted is the right answer regardless of preference.

Self-Hosted vs Cloud AI Agents: Which Deployment Model Is Right?

AI agent deployments come in two main flavors: cloud-hosted (the vendor runs the infrastructure) and self-hosted (you run it in your own environment). The choice has significant implications for data control, cost structure, security posture, and operational complexity. Most companies should start cloud-hosted and move portions self-hosted only when specific requirements demand it.

Written by Max Zeshut

Founder at Agentmelt

Cloud-hosted AI agents: the default

Cloud-hosted AI agent platforms (LangSmith, OpenAI Assistants, Anthropic Claude, vertical SaaS like Intercom Fin) handle infrastructure, scaling, model updates, and observability for you. Time to deploy: hours to days. Operational overhead: minimal. Cost: predictable monthly fees, often per-seat or per-task. This is the right starting point for 90% of teams—lower risk, faster ROI, smaller team burden.

Self-hosted AI agents: when it matters

Self-hosting (whether on your own servers or your private cloud) becomes essential when data sovereignty, regulatory compliance, or proprietary model access requires it. Industries that often self-host: defense and intelligence (classified data), healthcare (PHI residency requirements in some jurisdictions), financial services (regional regulators requiring data localization), and any business handling truly sensitive customer data where third-party processing is contractually disallowed.

Cost comparison

Cloud-hosted is typically cheaper at low volume—you don't pay for infrastructure you don't use. At very high volume (millions of tasks per month), self-hosting can be cheaper due to amortized infrastructure costs. The crossover point depends on use case but typically lands around $50K-$200K/year of AI API spend. Below that, cloud wins; above that, self-hosted becomes worth evaluating. Hidden costs in self-hosted: dedicated ML/ops engineering, model update management, security patching, scaling work.

Security and compliance considerations

Cloud platforms now offer enterprise security: SOC 2, ISO 27001, HIPAA-eligible deployments, data residency options, no-training guarantees, and customer-managed encryption keys. For many regulated industries, modern cloud AI is acceptable. Self-hosting is still required when: contracts explicitly prohibit third-party AI processing, air-gapped environments are mandated, or regulators require specific control attestations cloud vendors can't provide.

The hybrid pattern

A common emerging pattern: cloud-hosted for general workflows (most agents), self-hosted for the small subset handling regulated data. This keeps operational overhead manageable while addressing the specific compliance constraints. Tools like Bedrock, Azure OpenAI, and Vertex AI offer 'cloud with data residency' middle-ground options that satisfy many compliance requirements without full self-hosting overhead.