Low Rank Adapter Models can be used to modify the behaviour of an LLM or SLM via the injection of additional weights that modify model outputs.
Dettmers et al. 2023 implemented QLoRa, a quantized version of this technique that is more effective and uses fewer resources.
Serving LoRa Models
It is feasible to run multiple models on top of the same underlying SLM with low latency using tools like vllm and lorax