Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hmmm I ran llama2 ggml q4 in 6gb ram with llama.CPP on my laptop.


I very much do appreciate your comment and will look into into llama.cpp. Was it from here: https://github.com/ggerganov/llama.cpp

Do you have a guide that you followed and could link it to me or was it just from prior knowledge? Also, do you know if I could run the Wizard Vicuna on it? That model isn't listed on the above page.


Glad to be of help. Yea that is the repo.

https://replicate.com/blog/run-llama-locally

I found that guide here on hn.

I run it cpu only with 16 threads but yeah perf is good enough.

BTw my 6gb figure is me.measuring from htop so llama2 is likely less.


Thanks for the starting point. I'll give an update if I'm able to successfully run the other models. I hope it could help the community.


This code runs Llama2 quantized and unquantized in a roughly minimal way: https://github.com/srush/llama2.rs (though extracting the quantized 70B weights takes a lot of RAM). I'm running the 13B quantized model on ~10-11GB of CPU memory.


From what I gather, this is a Rust implementation that runs Llama2. Can it run any other models like the ones I'm having trouble finding info about?


Not sure about vicuna myself




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: