Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Has anyone outside of x.ai actually done inference with this model yet? And if so, have they provided details of the hardware? What type of AWS instance or whatever?

I think you can rent like an 8 x A100 or 8 x H100 and it's "affordable" to play around with for at least a few minutes. But you would need to know exactly how to set up the GPU cluster.

Because I doubt it's as simple as just 'python run.py' to get it going.



If you're just looking to test it out, it's probably easiest to wait for llama.cpp to add support (https://github.com/ggerganov/llama.cpp/issues/6120), and then you can run it slowly if you have enough RAM, or wait for one of the inference API providers like together.ai to add it. I'd like to add it to my NYT Connections benchmarks, and that's my plan (though it will require changing the prompt since it's a base model, not a chat/instruct model).


>it's probably easiest

Cheapest maybe, but easiest is just to rent a p4de.24xlarge from AWS for a couple hours to test (at around $40/hour..).


I'd expect more configuration issues in getting it to run on them than from a tested llama.cpp version, since this doesn't seem like a polished release. But maybe.


The NYT Connections benchmark sounds interesting, are the results available online?


GPT-4 Turbo: 31.0

Claude 3 Opus: 27.3

Mistral Large: 17.7

Mistral Medium: 15.3

Gemini Pro 1.0: 14.2

Qwen 1.5 72B Chat: 10.7

Claude 3 Sonnet: 7.6

GPT-3.5 Turbo: 4.2

Mixtral 8x7B Instruct: 4.2

Llama 2 70B Chat: 3.5

Nous Hermes 2 Yi 34B: 1.5

The interesting part is the large improvement from medium to large models. Existing over-optimized benchmarks don't show this.

- Max is 100. 267 puzzles, 3 prompts for each, uppercase and lowercase

- Partial credit is given if the puzzle is not fully solved

- There is only one attempt allowed per puzzle, 0-shot.

- Humans get 4 attempts and a hint when they are one step away from solving a group

I hoped to get the results of Gemini Advanced, Gemini Pro 1.5, and Grok and do a few-shot version before posting it on GitHub.


Where is this? I googled a bit, and found the game - but using it as a benchmark sounds genious!!


Someone could run Grok-1 on a 192GB M2 Mac when a 4-bit quant is released; I'm guessing that TheBloke is already working on it.


Fairly sure the bloke hasn't created any new quants in a month.


TheBloke dissapeared near the day https://nvd.nist.gov/vuln/detail/CVE-2024-23496 was published.

Of course there has been much speculation on this, I have no more information than this that can be backed up by facts, but the timing was suspicious.


He's started a company in the UK: https://suite.endole.co.uk/insight/company/15361921-thebloke...

Interestingly registered just around the corner from where one of my relatives used to live.


And his grant funding supposedly ran out.


Was any .gguf file hosted on HuggingFace found to be crafted in a way to exploit this?


what exactly are you implying here?


Still waiting on this one. Anyone find someone on twitter who can run it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: