For practical tasks, I would like to say FlanT5 11B which is 45GB but my experience is if you're using huggingface the usual way, it can initially take up to 2x the memory of the model to load.
GPT-JT was released recently and seems interesting but I haven't tried it. If you're focused on scientific domain and want to do Open book Q/A, summarization, keyword extraction etc. Galactica 6B parameter version might be worth checking out.
GPT-JT was released recently and seems interesting but I haven't tried it. If you're focused on scientific domain and want to do Open book Q/A, summarization, keyword extraction etc. Galactica 6B parameter version might be worth checking out.
If our main language is not English one of the mt0 models might be worth a try https://huggingface.co/bigscience/mt0-xl
These models are distinguished by being able to follow relatively complex natural language instructions and examples without needing to be finetuned.