It's a rather arbitrary metric. All it proves is that the token "spork" wasn't in midjourney's training data. A FAR better test for adherence is how well a model does when a detailed description for an untrained concept is provided as the prompt.
Yup it's an arbitrary metric, but I tried cajoling various image models into generating spork pictures with highly detailed descriptions (I have ComfyUI & AUTOMATIC1111 locally, and many models), which lead to me creating the site.
I'd say a better test for adherence is how well a model does when the detailed description falls in between two very well known concepts - it's kinda like those pictures from the 1500s of exotic animals seen by explorers drawn by people using only the field notes after a long voyage back.
The combination of T5 / clip coupled with a much larger model means there's less need to rely on custom LoRAs for unfamiliar concepts which is awesome.
EDIT: If you've got the GPU for it, I'd recommend downloading a copy of the latest version of the SD-WEBUI Forge repo along with the DEV checkpoint of Flux (not schnell). It's super impressive and I get an iteration speed of roughly 15 seconds per 1024x1024 image.
Welll.... there's a hundred ways we could measure prompt adherence everything from:
- Descriptive -> describing a difficult concept that is most certainly NOT in the training data
- Hybrids -> fusions of familiar concepts
- Platonic overrides -> this is my phrase for attempting to see how well you can OVERRIDE very emphasized training data. For example, a zebra with horizontal stripes.
https://gondolaprime.pw/pictures/sporks-generative.jpg
It's a rather arbitrary metric. All it proves is that the token "spork" wasn't in midjourney's training data. A FAR better test for adherence is how well a model does when a detailed description for an untrained concept is provided as the prompt.