Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I want to use an LLM to do translation, should I use a base model or an instruction tuned version? I've had mixed results using the chat models and a simple "Translate this to <language>: "


For a 9B model like EuroLLM, fine tuning the base model is pretty viable. You don't need a lot of samples, on the order of 300 high quality examples can produce good results, and the GPU time is pretty manageable with rented GPU instances

Just the base model and a template like "English: {text}\n{language}:" can also work with a bit of filter and retry logic




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: