Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if there'd be any use for a "ontological" representation, somewhere in-between a natural-language string and its embedding in a particular LLM. Maybe something that balances human-readability, LLM-composability, lack of brittleness, insight into the local structure of the embedding, etc.


I wonder too. I imagine the best we could do with present technology is to get back the generated text in the form of text tokens accompanied by their corresponding deep embeddings (last hidden states): `[(text_token, deep_emb), (text_token, deep_emb), ...]`. Those deep embeddings incorporate "everything the model knows" about each generated token of text.


Maybe a mapping/representation for a "medium embedding" could be learned that strikes a balance between shallow and deep. I have no idea what a good objective-function would be, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: