If you have 64 GB of RAM you should be able to run the 4-bit quantized mlx model... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		poorman 5 months ago \| parent \| context \| favorite \| on: Show HN: Run Qwen3-Next-80B on 8GB GPU at 1tok/2s ... If you have 64 GB of RAM you should be able to run the 4-bit quantized mlx models, which are specifically for the Apple silicon M chips. https://huggingface.co/collections/mlx-community/qwen3-next-...

cahaya 5 months ago [–]

Got 32GB so was hoping I could use ollm to offload it to my SSD. Slower but making it possible to run bigger models (in emergencies)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact