Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> "[...]the final result is more cohesive around a single them than the original idea."

That's an observation worth investigating. Here's another set of data points to see if there's more to it...

Input prompt: "Six robots on a boat with harpoons, battling sharks with lasers strapped to their heads"

GPT4V prompt: "Write a prompt for an AI to make this image. Just return the prompt, don't say anything else. Make it funnier."

Result: https://dalle.party/?party=pfWGthli

Cost: Ten iterations @ $0.41

(Addendum: I'd forgotten to mention that I believe the cost differential is due to the token count of each of the prompts. The first case mentioned had less words passed through each of the prompts than the later attempts when I asked it to 'make it whimsical' or 'make it funnier'.)



Pretty dissapointing how in the first picture the robots are standing there, just like a character selection in a videogame, maybe the dataset don't have many robots fighting just static ones. Talking about videogames, someone should make one based on this concept specially the 7th image[0], I wanna be a dolphin with a machine gun strapped on its head fighting flying cyber demonic whales.

[0] https://i.imgur.com/q502is4.png


Both of your examples seem to start with two subjects (steam engine/flying machine and shark/robot), and throughout the animation one of them gets more prominence until the other is eventually dropped altogether.


I was curious if two subject prompts behaved different from three subject, so I've run three additional tests, each with the same three subjects and general prompt structure + instructions, but swapping the position of each subject in the prompt. Each test was run for ten iterations.

GPT4V instructions for all tests: "Write a prompt for an AI to make this image. Just return the prompt, don't say anything else. Make it weirder."

From what you'll see in the results there's possible evidence of bias towards the first subject listed in a prompt, making it the object of fixation through the subsequent iterations. I'll also speculate that "gnomes" (and their derivations) and "cosmic images" are over-represented as subjects in the underlying training data. But that's wild speculation based on an extremely small sample of results.

In any case, playing around with this tool has been enjoyable and a fun use of API credits. Thank you @z991 for putting this together and sharing it!

------ Test 1 ------

Prompt: "Two garden gnomes, a sentient mushroom, and a sugar skull who once played a gig at CBGB in New York City converse about the boundaries of artificial intelligence."

Result: https://dalle.party/?party=ZSOHsnZe

------ Test 2 ------

Prompt: "A sentient mushroom, a sugar skull who once played a gig at CBGB in New York City, and two garden gnomes converse about the boundaries of artificial intelligence."

Result: https://dalle.party/?party=pojziwkU

------ Test 3 ------

Prompt: "A sugar skull who once played a gig at CBGB in New York City, a sentient mushroom, and two garden gnomes converse about the boundaries of artificial intelligence."

Result: https://dalle.party/?party=RBIjLSuZ




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: