Brainsteam

❯

❯

LoRA

Jun 30, 20251 min read

AI
ai/models
ai/finetuning
AI/LLM

Low Rank Adapter Models can be used to modify the behaviour of an LLM or SLM via the injection of additional weights that modify model outputs.

Dettmers et al. 2023 implemented QLoRa, a quantized version of this technique that is more effective and uses fewer resources.

Serving LoRa Models

It is feasible to run multiple models on top of the same underlying SLM with low latency using tools like vllm and lorax

Graph View

Backlinks

Huggingface SFT
vllm

Created with Quartz v4.5.1 © 2025

Blog
Mastodon
Bluesky