LLMs & AI Infra LLM Serving stable

Ollama

Run LLMs locally with one command

110.0K stars 500 contributors Since 2023
Website → GitHub

Local LLM inference server that downloads and runs open-weight models with a single command, exposing an OpenAI-compatible REST API for integration.

License
MIT
Min RAM
4 GB
Min CPUs
2 cores
Scaling
single_node
Complexity
beginner
Performance
medium
Self-hostable
K8s native
Offline
Pricing
fully free
Docs quality
good
Vendor lock-in
none

Use cases

  • Run LLMs locally for private/offline AI
  • Development environment with local AI models
  • Code completion backend for Continue/Tabby
  • Chatbot prototype without API costs

Anti-patterns / when NOT to use

  • Not for high-throughput production serving
  • Single-user optimized not multi-tenant
  • No built-in batching or queuing
  • Needs decent GPU for large models

Replaces / alternatives to

  • OpenAI API
  • Claude API
  • cloud LLM endpoints

Technical specs

Language
GoC++
API type
REST
Protocols
HTTP
Deployment
binarydocker
SDKs
pythonjavascriptgorust

Community

GitHub stars 110.0K
Contributors 500
Commit frequency daily
Plugin ecosystem medium
Backing Ollama Inc
Funding vc_backed

Release

Latest version
Last release
Since 2023

Best fit

Team size
solosmallmedium
Industries
generaldevelopmentresearch

Tags

  • llm
  • local-inference
  • privacy
  • model-runner
  • openai-compatible
  • gpu