LLM Utility
I’m a big fan of Simon Willison’s llm package. It works nicely with llama-cpp.
Installing llm
I didn’t get on well with pipx in this use case so I used conda to create a virtual environments for LLM and then installed it in there.
Since I have an NVIDIA card I pass in CMAKE flags to have it build support for cuda:
Installing a Model
The LLM utility can automatically download gguf formatted models from huggingface:
Running Model with GPU Offload
Make use of the n_gpu_layers
option to offload all model layers to the GPU if you have enough vram - this should speed up the generation process significantly.
LangChain
LangChain is a FOSS library for chaining together prompt-able language models. I’ve been using it for building all sorts of cool stuff.
Structured Outputs
There are a number of ways that we can coerce LLMs into providing structured responses, some of which are described here. My favourites are: