Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What is the input? Is it converting a text query like "chair" to a mesh?

edit: Seems like mesh completion is the main input-output method, not just a neat feature.



Yeah it's hard to tell.

It looks like the input is itself a 3D mesh? So the model is doing "shape completion" (e.g. they show generating a chair from just some legs)... or possibly generating "variations" when the input shape is more complete?

But I guess it's a starting point... maybe you could use another model that does worse quality text-to-mesh as the input and get something more crisp and coherent from this one.


You prompt this LLM using 3D meshes for it to complete, in the same manner you use language to prompt language specific LLMs.


That's what it seems like. Although this is not an LLM.

> Inspired by recent advances in powerful large language models, we adopt a sequence-based approach to autoregressively generate triangle meshes as sequences of triangles.

It's only inspired by LLMs


This is sort of a distinction without a difference. It's an autoregressive sequence model; the distinction is how you're encoding data into (and out of) a sequence of tokens.

LLMs are autoregressive sequence models where the "role" of the graph convolutional encoder here is filled by a BPE tokenizer (also a learned model, just a much simpler one than the model used here). That this works implies that you can probably port this idea to other domains by designing clever codecs which map their feature space into discrete token sequences, similarly.

(Everything is feature engineering if you squint hard enough.)


The only difference is the label, really. The underlying transformer architecture and the approach of using a codebook is identical to a large language model. The same approach was also used originally for image generation in DALL-E 1.


That's what I was wondering. From the diagram it looks like the input is other chair meshes, which makes it somewhat less interesting.


Really the hardest thing with art is details and usually seperates good from bad. So if you can sketch what you want roughly without skill and have the details generated, that's extremely useful. And image to image with the existing diffusion models is useful and popular.


I have no idea about your background when I am commenting here. But these are my two cents.

NO. Details are mostly like icing on top of the cake. Sure, good details make good art but it is not always the case. True and beautiful art requires form + shape. What you are saying is something visually appealing. So, the reason why diffusion models feel so bland is because they are good with details but do not have precise forms and shape. Nowadays they are getting better, however, it still remains an issue.

Form + shape > details is something they teach in Art 101.


There's also examples of tables, lamps, couches, etc in the video.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: