Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You could do an architectural search, and Google previously did that for CNNs with it's NASNet (Network Architectural Search) series of architectures, but the problem is you first need to decide what are the architectural components you want your search process to operate over, so you are baking in a lot of assumptions from the start and massively reducing the search space (because this is necessary to be computationally viable).

A search or evolutionary process would also need an AGI-evaluator to guide the search, and this evaluator would then determine the characteristics of the solution found, so it rather smacks of benchmark gaming rather than the preferred approach of designing for generic capabilities rather than specific evaluations.

I wouldn't say we don't know how LLMs "work" - clearly we know how the transformer itself works, and it was designed intentionally with certain approach in mind - we just don't know all the details of what representations it has learnt from the data. I also wouldn't say LLMs/transformers represent a bitter lesson approach since the architecture is so specific - there is a lot of assumptions baked into it.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: