Do you have a guide that you followed and could link it to me or was it just from prior knowledge? Also, do you know if I could run the Wizard Vicuna on it? That model isn't listed on the above page.
This code runs Llama2 quantized and unquantized in a roughly minimal way: https://github.com/srush/llama2.rs (though extracting the quantized 70B weights takes a lot of RAM). I'm running the 13B quantized model on ~10-11GB of CPU memory.