Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This code runs Llama2 quantized and unquantized in a roughly minimal way: https://github.com/srush/llama2.rs (though extracting the quantized 70B weights takes a lot of RAM). I'm running the 13B quantized model on ~10-11GB of CPU memory.


From what I gather, this is a Rust implementation that runs Llama2. Can it run any other models like the ones I'm having trouble finding info about?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: