This guide shows a complete setup for running Open Web UI on your own server with the ability to call Local LLMs and commercial LLMs like GPT-4, Anthropic Claude, Google Gemini and Groq LLama3.
Overall Setup
- We will use Open Web UI as the user interface for talking to the models.
- We will use ollama for running local models. NB: This is optional, the setup works with just third party apis.
- We will use LiteLLM for calling out to remote models and tracking usage/billing
- LiteLLM uses PostgreSQL for storing data relating to keywords, API calls + logs etc
- We use Caddy to provide a reverse proxy so that we can use the chat ui over the open internet.
Docker Compose
Open Web UI
We will stand up Open Web UI and make sure it has somewhere to store its data and a TCP port so that we can access it over the network:
(Optional) Ollama
If we are using Ollama then we define the service and also provision any GPU resources (if available):
Inbound HTTP Traffic
Let’s imagine we want to run a instance running inside the same docker-compose network as our UI and litellm instances.
We have set up two subdomains:
chat.example.com
which will serve Open Web UI requestsapi.example.com
that will serve requests to the AI APIS and also provide an admin interface.
We also need to create our caddy config file ./caddy/Caddyfile
that tells the reverse proxy where to send inbound traffic.
chat.example.com {
reverse_proxy ui:3000
}
api.example.com {
reverse_proxy litellm:4000
}