Running local LLMs has never been easier. Ollama downloads and runs Llama, Mistral, Qwen, and dozens of other models in one command. The problem is the interface: a terminal API isn't how most people want to interact with a language model. You want a clean chat UI, conversation history, document upload, and the ability to share access with your team — everything ChatGPT gives you, but running entirely on your own hardware.

Open WebUI is that interface. It started as "Ollama WebUI" — a simple frontend for Ollama — and has grown into a full self-hosted AI platform that connects to Ollama, every OpenAI-compatible API, Anthropic's Claude directly, and a dozen other backends. Same interface, complete data control, zero per-token cost for local models.

What Open WebUI is

Open WebUI is an extensible, feature-rich self-hosted AI platform designed to operate entirely offline. It supports Ollama and OpenAI-compatible APIs, making it provider-agnostic for both local and cloud-based models. 139,000+ GitHub stars make it one of the most-starred AI projects on GitHub. It's backed by Andreessen Horowitz (a16z), Mozilla Builders 2024, and GitHub Accelerator 2024 — unusual backing for an open source project, and a signal of serious long-term investment.

The core concept: one interface for every AI model you run. Connect Ollama for local models, OpenAI for GPT-4o/GPT-5, Anthropic for Claude, vLLM for high-throughput self-hosted inference, or any OpenAI-compatible endpoint. Multiple backends can be active simultaneously — route internal confidential data to a local Ollama model while using the Claude API for tasks that benefit from frontier model capability, all from the same UI.

The license is MIT — genuinely open source with no commercial restrictions. Installation options: Docker (most common), Python via uvx, or a desktop app. There's also an Enterprise plan for organizations needing custom branding, SLA support, and LTS versions.

The model layer — Ollama

Open WebUI is the chat layer; Ollama is the model layer. They talk over HTTP on port 11434. Understanding both is useful:

Ollama handles model downloading, quantization, GPU/CPU allocation, and exposes an OpenAI-compatible API. Install it with one command:

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull and run a model
ollama pull llama3.2:3b
ollama run llama3.2:3b

# For production, keep it running as a service
# Ollama listens on http://localhost:11434

Model recommendations to start with:

  • llama3.2:3b — fast on almost anything, 2GB VRAM, good for general chat
  • qwen2.5:14b — strong reasoning, good multilingual, 8GB+ VRAM
  • deepseek-r1:7b — strong at code and technical reasoning
  • mistral:7b — fast, good instruction following, 4GB VRAM

On Apple Silicon Macs, run Ollama natively (not in Docker) — this lets it use the Metal GPU. Docker doesn't have Metal GPU passthrough, so a Dockerized Ollama on Mac falls back to CPU and is significantly slower.

Self-hosting Open WebUI

The fastest path when Ollama is already running on your machine:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Navigate to http://localhost:3000, create your admin account (the first user becomes admin automatically), and you're chatting. The --add-host=host.docker.internal:host-gateway flag allows the Docker container to reach Ollama running on the host machine.

For a self-contained Docker Compose setup with Ollama bundled:

services:
  ollama:
    image: ollama/ollama
    volumes:
      - ollama-data:/root/.ollama
    restart: unless-stopped
    # For GPU: add runtime: nvidia + NVIDIA_VISIBLE_DEVICES: all

  open-webui:
    image: ghcr.io/open-webui/open-webui:ollama
    depends_on:
      - ollama
    environment:
      OLLAMA_BASE_URL: http://ollama:11434
      WEBUI_SECRET_KEY: ${WEBUI_SECRET_KEY}
    volumes:
      - open-webui-data:/app/backend/data
    ports:
      - "3000:8080"
    restart: unless-stopped

volumes:
  ollama-data:
  open-webui-data:

Put Traefik in front for HTTPS. Minimum requirements: 4GB RAM for the Open WebUI container, plus whatever the model requires (3-8GB VRAM for typical 7B models). Open WebUI itself is lightweight — the model is where the resources go.

Core features

Multi-backend model management

Add API keys for OpenAI, Anthropic, or any OpenAI-compatible endpoint in Settings → Connections. All connected models appear in the same model selector dropdown. Switch between your local Llama 3.2 and Claude 3.5 Sonnet in the same conversation, or set a default model per workspace.

Built-in RAG

Upload PDFs, Word docs, HTML, CSV, or any text file to a conversation and the built-in RAG engine chunks, embeds, and retrieves relevant context automatically. No separate pipeline to configure. Open WebUI supports five document extraction engines including Tika and Docling (the same IBM Research parser used in OpenRAG). Create persistent knowledge bases that any conversation can reference — upload your company documentation once and query it from any chat.

Python function calling (Pipelines / Functions)

This is where Open WebUI goes beyond a simple chat interface. The native Python Function calling system (previously called Pipelines) lets you extend model capabilities with pure Python functions that run server-side. Add a function that queries your database, calls an internal API, runs a shell command, or does any arbitrary computation — and the model can call it as a tool. Bring Your Own Function (BYOF) by simply adding Python code through the UI's tools workspace.

Web search integration

Connect a search engine to give models access to real-time information. Supported: SearXNG (self-hosted, privacy-first), Google, Brave, DuckDuckGo, Kagi, Perplexity, and others. Configure once, and any conversation can trigger a web search when the model needs current information. The SearXNG combination gives you fully local search-augmented generation with no data leaving your network.

Multi-user with authentication

Create user accounts, assign roles (admin or user), and control model access per user. OAuth support for Google, GitHub, Microsoft, and OIDC providers — including Authentik, which we've covered on this blog. Role-based access control means you can expose Open WebUI to your team without giving everyone admin access or access to every model.

Voice and video

Speech-to-text (STT) via local Whisper, OpenAI, Deepgram, or Azure, and text-to-speech (TTS) via Azure, ElevenLabs, OpenAI, or local Transformers. Hands-free voice conversations with your local models, running entirely on-device when using the Whisper and Transformers options.

Model Arena (A/B testing)

Send the same prompt to multiple models simultaneously in a split-screen view and compare responses. Rate responses, build preference datasets, and understand how different models handle the same task. Useful for evaluating whether it's worth running a larger model for a specific use case.

Notes

A built-in note-taking workspace with Markdown, code blocks, and to-do lists — with AI integration for tone adjustment and style enhancement. Feed notes into conversations and query your LLM about them. A small but useful addition that reduces context switching.

Open WebUI vs the alternatives

vs LibreChat — LibreChat is the other major self-hosted chat platform. Open WebUI optimizes for Ollama-first, polished UX and local model workflows. LibreChat optimizes for multi-provider enterprise flexibility (deeper SSO, RBAC, audit logs, plugin marketplace). Open WebUI is prettier and easier to set up. LibreChat is more configurable for large organizations with complex requirements. For most teams, Open WebUI's feature set is sufficient and the setup is faster.

vs LM Studio / Jan — these are desktop-first, single-user tools. No server, no multi-user, no RAG pipeline. Jan is the simplest possible "run a model, talk to it" experience. Open WebUI is the right choice the moment you need multi-user, document upload, or team sharing. Not competing — they target different use cases.

vs AnythingLLM — AnythingLLM focuses heavily on document ingestion and workspace organization, with slightly better document management ergonomics. Open WebUI has a more polished chat interface, more model backends, and the Python Functions system for extensibility. Open WebUI is the better choice for teams that need a general-purpose AI platform; AnythingLLM is better if document Q&A is the primary use case (though OpenRAG or Dify may be even better for that).

vs Dify — Dify is a full LLM application platform (covered on this blog). Open WebUI is a chat interface. They serve different purposes and can coexist: Open WebUI for conversational use, Dify for building structured LLM workflows and applications. Dify's RAG is more production-grade; Open WebUI's chat experience is more polished.

Practical DevOps use cases

  • Private code assistant — connect a coding-focused model (deepseek-r1, qwen2.5-coder) and use it for code review, refactoring, and generation without sending proprietary code to external APIs
  • Internal documentation chat — upload runbooks and architecture docs to a knowledge base, query them conversationally
  • Team AI platform — give your engineering team a shared ChatGPT-equivalent that routes sensitive work to local Ollama and general work to frontier model APIs
  • Model evaluation — use the Arena feature to A/B test models for specific internal tasks and justify model selection decisions
  • Offline environments — air-gapped infrastructure where no data can leave the network; Open WebUI + Ollama is the answer

Who it's for

Good fit:

  • Teams who want a private ChatGPT-equivalent they control completely
  • Anyone running local models via Ollama who wants a proper chat interface
  • Organizations with compliance or data sovereignty requirements ruling out SaaS AI tools
  • Teams that want to run both local models and frontier model APIs from the same interface
  • DevOps engineers who want to extend the interface with custom Python functions calling internal APIs

Not the right fit:

  • Solo users who just want the simplest possible local model experience — Jan or LM Studio are simpler
  • Teams primarily building structured LLM applications rather than having conversations — use Dify
  • Organizations needing deep enterprise governance (RBAC per resource, full audit logs) — consider LibreChat or the Open WebUI Enterprise plan

My take

Open WebUI is the right answer to "my team needs a private AI assistant." The setup is fast — one Docker command if you already have Ollama running. The interface is genuinely polished, comparable to what you'd expect from a commercial product. The model-agnostic architecture means you're never locked in: run local models for sensitive work, route to Claude or GPT for tasks that need frontier capability, all without changing tools.

The Python Functions system is the feature that separates it from being just a chat UI. Being able to add custom tools that call your internal APIs, query your databases, or run arbitrary logic — and have the model call those tools automatically — turns Open WebUI into a platform rather than an interface. For DevOps teams who want a private AI assistant that can actually do things in their infrastructure, this is the capability that matters.

The 139,000+ stars and a16z backing reflect real community adoption. This is the most-used self-hosted AI platform in the space, which means the most community tooling, the most documented configurations, and the best chance the project stays maintained. For any team running self-hosted infrastructure who hasn't deployed Open WebUI yet, it's worth an afternoon.


PIPOLINE · DEVOPS CONSULTING

Need help setting up Open WebUI + Ollama?

Deploying Open WebUI with Ollama, configuring GPU passthrough correctly, setting up multi-user auth with Authentik OIDC, adding custom Python functions for your internal APIs, and putting it behind Traefik with HTTPS — takes experience to get right the first time. I can handle the full setup and configure it for your team's specific workflow. You get a production-ready private AI platform without spending a day on Docker networking and GPU configuration.

Get in touch at pipoline.com →