LM Studio · Ollama + Open WebUI · vLLM — June 23, 2026
Workshop stand — Open WebUI in the browser, no local setup needed:
Best for: people who don't want to deal with the terminal. Open the app → find a model → download → Chat.
openai/gpt-oss-20b, lmstudio-community/gemma-4-12B-it-GGUF, or bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF.Search examples: openai/gpt-oss-20b · lmstudio-community/gemma-4-12B-it-GGUF · bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF · bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF · unsloth/GLM-4.7-Flash-GGUF
LM Studio can also run a local OpenAI-compatible server — see the API / local server docs.
Best for: browser chat. Ollama runs models; Open WebUI is the interface.
gpt-oss:20b.Install via the official Quick Start. Docker is recommended.
Open in browser: http://localhost:3000
Without Docker:
Format: hf.co/owner/repo:quant. Public GGUF usually needs no token; gated/private models need a Hugging Face token.
Ollama supports an OpenAI-compatible API.
Best for: a fast OpenAI-compatible server for projects, tests, RAG, agents, or multiple clients. vLLM is the backend — default port localhost:8000.
uv or pip in a venv.HF_TOKEN only for gated/private models.http://localhost:8000/v1http://localhost:8000/v1 (on host). If Open WebUI is in Docker on Windows/macOS, try http://host.docker.internal:8000/v1local-key (or whatever you passed to --api-key)TPS figures are rough guides for RTX 4090, batch=1, 4k–8k context, 4-bit/FP8/MXFP4 where available — not guaranteed benchmarks. Long context, heavy prompts, and CPU offload can cut speed 2–5×.
Unsloth, LM Studio community, and bartowski entries are often GGUF packagers — check the original model provider separately.
openai/gpt-oss-20b — good default for all three stackslmstudio-community/gemma-4-12B-it-GGUF — 12B, start Q4_K_Mbartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF — 24B, Q4 recommendedbartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF — 32B, Q4 only on 24 GBunsloth/GLM-4.7-Flash-GGUF — try UD-Q4_K_XL