We're doing multilingual testing and I can confirm what you've observed: Gemma 4 is surprisingly good at multilingual tasks, especially given its size. This is mostly true for the dense 31B model.
Same, I quickly tested it for code gen and it produced mostly good code for simple problems, but it sometimes hallucinated words in non-English scripts inside the code.
For anyone interested in multilingual performance, which is not usually well benchmarked or reported: Gemma 4 does really well, especially the dense 31B version. In fact, it outperforms many models with an order of magnitude higher number of parameters.
It is not quite capable of performing work on really long tail languages, but their claim of 35 languages supported (and a hint of some knowledge of up to 140) was substantiated by our tests.
If you're doing work outside of English and/or need to run a translation model in your terms, Gemma 4 is a very good candidate.
Thank you. +1.
There are obviously differences and things getting lost or slightly misaligned in the latent space, and these do cause degradation in reasoning quality, but the decline is very small in high resource languages.
This is probably not a core concern for most HN readers, but at work we do multilingual testing for synthetic text data generation and natural language processing. Emphasis on multilingual. Gemini has made some serious leaps from 1.5 to 2.5 and now 3.0, and is actually proficient in languages that other models can only dream of. On the other hand, GPT-5 has a really mixed performance in a lot of categories.
This goes way back. Even back in the 1.5 days it was the best multilingual model, when HN still treated it as entirely uncompetitive all-around. Just because, exactly as you're saying, it's not a core concern of people here. The two fields Gemini models have been number one at for years now are A. multilinguality B. image understanding. At no point since the release of Gemini 1.5 Pro way back has any Anthropic or OpenAI model done performed better at either.
Even those who have zero experience with different (human) languages could've known this if they liked, from the fact that on the LMArena leaderboards, Gemini models have consistently ranked much higher in non-English languages than in English. This gap has actually shrunk a lot over time! In the 1.5 Pro days this advantage was huge, it would be like 10th in English and 2nd in many other languages.
Nevertheless, it still depends on the specific language you're targeting. Gemini isn't the winner on every single one of them. If you're only going to choose one model for use with many languages, it should be Gemini. But if the set of languages isn't too large, optimizing model selection per language is worth it.
In our previous tests, when it was 1.5 Pro against GPT 4o and Claude Sonnet 3.7, Gemini wasn't winning in the multilingual race, but it was definitely competitive. 2.5 and 3.0 seems to be big leaps from the 1.5 days.
That said, it also depends on the testing methodology; we tested a bunch of use cases mostly to test core linguistic proficiency. Not as much complex tasks in language or cultural knowledge.
Which languages, how popular, how many? The biggest difference has been for low-resource or far-from-English languages. Thai, Korean, Vietnamese, and so on. For something like German or French all of them were of course good enough that general intelligence and other factors overruled any language differences. I didn't take screenshots, maybe archive.org has them, but during the entire period of that generation of models on the LMArena leaderboard there was this large gap between 1.5 Pro rankings on such languages vs on English, which was backed up by our experience including feedback from groups of native speakers.
And regarding specific models - we obviously only tested a few languages, and there are thousands of them in the world. But Gemini seems to lead the pack basically regardless of the language your throw at it. YMMV.
You could ask GPT for what it knows about you and use it to seed your personal preferences to a new model/app. Not perfect and probably quite lossy, but likely much better than starting from scratch.
+1 on this one!
I only use LLMs once I'm done with writing, and basically using them as my editor.
In case it helps anyone, here is my prompt:
"You are a professional writer and editor with many years of experience. Your task is to provide writing feedback, point out issues and suggest corrections. You do not use flattery. You are matter of fact. You don't completely rewrite the text unless it is absolutely necessary - instead you try to retain the original voice and style. You focus on grammar, flow and naturalness. You are welcome to provide advice changing the content, but only do that in important cases.
If the text is longer, you provide your feedback in chunks by paragraph or other logical elements.
Do not provide false praise, be honest and feel free to point out any issues."
(Yes, you kind of need to repeat you're actively not looking for a pat on the back, otherwise it keeps telling you how brilliant your writing is instead of giving useful advice.)
reply