What is the input? Is it converting a text query like "chair" to a mesh? edit: S...

anentropic · on Nov 28, 2023

Yeah it's hard to tell.

It looks like the input is itself a 3D mesh? So the model is doing "shape completion" (e.g. they show generating a chair from just some legs)... or possibly generating "variations" when the input shape is more complete?

But I guess it's a starting point... maybe you could use another model that does worse quality text-to-mesh as the input and get something more crisp and coherent from this one.

all2 · on Nov 28, 2023

You prompt this LLM using 3D meshes for it to complete, in the same manner you use language to prompt language specific LLMs.

owenpalmer · on Nov 28, 2023

That's what it seems like. Although this is not an LLM.

> Inspired by recent advances in powerful large language models, we adopt a sequence-based approach to autoregressively generate triangle meshes as sequences of triangles.

It's only inspired by LLMs

adw · on Nov 28, 2023

This is sort of a distinction without a difference. It's an autoregressive sequence model; the distinction is how you're encoding data into (and out of) a sequence of tokens.

LLMs are autoregressive sequence models where the "role" of the graph convolutional encoder here is filled by a BPE tokenizer (also a learned model, just a much simpler one than the model used here). That this works implies that you can probably port this idea to other domains by designing clever codecs which map their feature space into discrete token sequences, similarly.

(Everything is feature engineering if you squint hard enough.)

ShamelessC · on Nov 28, 2023

The only difference is the label, really. The underlying transformer architecture and the approach of using a codebook is identical to a large language model. The same approach was also used originally for image generation in DALL-E 1.

CamperBob2 · on Nov 28, 2023

That's what I was wondering. From the diagram it looks like the input is other chair meshes, which makes it somewhat less interesting.

tayo42 · on Nov 28, 2023

Really the hardest thing with art is details and usually seperates good from bad. So if you can sketch what you want roughly without skill and have the details generated, that's extremely useful. And image to image with the existing diffusion models is useful and popular.

nullptr_deref · on Nov 28, 2023

I have no idea about your background when I am commenting here. But these are my two cents.

NO. Details are mostly like icing on top of the cake. Sure, good details make good art but it is not always the case. True and beautiful art requires form + shape. What you are saying is something visually appealing. So, the reason why diffusion models feel so bland is because they are good with details but do not have precise forms and shape. Nowadays they are getting better, however, it still remains an issue.

Form + shape > details is something they teach in Art 101.

treyd · on Nov 28, 2023

There's also examples of tables, lamps, couches, etc in the video.