LLM requirements
Published on Dec 9, 2023 in Former LLM Course
Foreword
In the fall of 2023, I started a course on the use of LLMs. But technology was evolving too fast, so I ended up finding it irrelevant to maintain your own backend to launch an LLM.
In less than a year, I went from a simple backend based on Hugging Face’s Transformers lib, with the option load_in_4bit
or load_in_8bit
.
Then the lib integrated GTPQ support, so I switched to quantized models. Then I updated to support ExLlama. Then I switched to AWQ model support.
At the same time, OpenAI’s API model became the norm and my backend was inspired by OobaBooga’s API. When the exl2 model became trendy, I decided to ditch my backend and switch to TabbyAPI.
Since then, all my LLM-based development has been using an OpenAi API-like interface and I can switch backends easily.
It is still interesting to know the Transformers lib. You can do a lot of things with it.
The part on installing the system and containerizing an app is still a basic knowledge to have if you want to work with LLMs.
Here is the requirements part of the course. The values are not quite identical with current technologies. But this will give you an idea.
This comes from the course
To complete this course, you will require a computer having NVIDIA graphics card and enough space to install Linux. It may be possible to modify the course to work on Windows using WSL but that’s not my area of expertise. I mostly use Windows for gaming so I don’t have much experience with doing serious things with it.
9GB VRAM is sufficient to run a 7B 4bits model. If you desire to understand how to train the model, you could do so on a system that features 14GB VRAM utilizing QLoRA methodology. However, to merge the QLoRA, you would require an even more powerful setup consisting of 18GB VRAM.
A machine with 24GB VRAM is a good choice if you want to be able to experiment everything in this course without limits. However, if your current computer isn’t powerful enough, you could also consider renting an on-demand instance on the cloud such as a V100S with 32GB VRAM at 0.8€ per hour (unsponsored link).