Others should really implement that as well. You still need to guide the model to produce e.g. JSON to get good results, but they will 100% guaranteed be valid per the grammar.
What kind of work? I've only given it a short try before moving to Ollama that doesn't have it, but it seemed to have worked there. (With ollama I need to use a retry system.)
edit: I researched a bit and apparently it can reduce performance, plus the streaming mode fails to report incorrect grammars. Overall these don't seem like deal-breakers.
Others should really implement that as well. You still need to guide the model to produce e.g. JSON to get good results, but they will 100% guaranteed be valid per the grammar.