Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reading the post the architectural change is combining a vision model (Mistral 3 in the flux.2 case) with a rectified flow transformer.

I wonder if this architectural change makes it easier to use other vision models such as the ones in Llama 3 and 4, or possibly a future Llama 5.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: