Github Repo A GGUF runtime server similar to ollama and vllm Passing --model or env var MODEL_NAME in docker to tell it what model to load. You can pass the path to a gguf file.