Inference Requirements

As below a good rule of thumb for inference requirements is 20% on top of the base model representation. This excellent blog post goes into more specific detail around what is required and why.

HF Accelerate Memory Usage Calculator

A nice model memory usage calculator that works for Transformers-based models: https://huggingface.co/spaces/hf-accelerate/model-memory-usage. It was written by accelerate project lead Zach Mueller . Zach also provides the following useful rules-of-thumb in the calculator’s intro text:

The minimum recommended vRAM needed for a model is denoted as the size of the “largest layer”, and training of a model is roughly 4x its size (for Adam). These calculations are accurate within a few percent at most, such as bert-base-cased being 413.68 MB and the calculator estimating 413.18 MB.

When performing inference, expect to add up to an additional 20% to this as found by EleutherAI

Training with LoRA

These notes by Hamel Husain provide more detail regarding the memory requirements for training LoRA model adapters. He starts by recommending the HF Accelerate Memory Usage Calculator as a baseline and then calculating additional parameters.

More Resources