Reading the post the architectural change is combining a vision model (Mistral 3...

Reading the post the architectural change is combining a vision model (Mistral 3 in the flux.2 case) with a rectified flow transformer.

I wonder if this architectural change makes it easier to use other vision models such as the ones in Llama 3 and 4, or possibly a future Llama 5.