This guide shows a complete setup for running Open Web UI on your own server with the ability to call Local LLMs and commercial LLMs like GPT-4, Anthropic Claude, Google Gemini and Groq LLama3.
Overall Setup
- We will use Open Web UI as the user interface for talking to the models.
- We will use ollama for running local models. NB: This is optional, the setup works with just third party apis.
- We will use LiteLLM for calling out to remote models and tracking usage/billing
- LiteLLM uses PostgreSQL for storing data relating to keywords, API calls + logs etc
- We use Caddy to provide a reverse proxy so that we can use the chat ui over the open internet.
Docker Compose
Open Web UI
We will stand up Open Web UI and make sure it has somewhere to store its data and a TCP port so that we can access it over the network:
ui:
image: ghcr.io/open-webui/open-webui:main
restart: always
ports:
- 8080:8080
volumes:
- ./open-webui:/app/backend/data
environment:
- "ENABLE_SIGNUP=false"
- "OLLAMA_BASE_URL=http://ollama:11434"
(Optional) Ollama
If we are using Ollama then we define the service and also provision any GPU resources (if available):
ollama:
image: ollama/ollama
restart: always
environment:
- OLLAMA_MAX_LOADED_MODELS=2
- OLLAMA_FLASH_ATTENTION=true
- OLLAMA_KEEP_ALIVE=-1
ports:
- 11434:11434
volumes:
- ./ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Inbound HTTP Traffic
Let’s imagine we want to run a instance running inside the same docker-compose network as our UI and litellm instances.
We have set up two subdomains:
chat.example.com
which will serve Open Web UI requestsapi.example.com
that will serve requests to the AI APIS and also provide an admin interface.
caddy:
image: caddy:2.7
restart: unless-stopped
ports:
- "80:80"
- "443:443"
- "443:443/udp"
volumes:
- ./caddy/Caddyfile:/etc/caddy/Caddyfile
- ./caddy/data:/data
- ./caddy/config:/config
We also need to create our caddy config file ./caddy/Caddyfile
that tells the reverse proxy where to send inbound traffic.
chat.example.com {
reverse_proxy ui:3000
}
api.example.com {
reverse_proxy litellm:4000
}