Has anyone outside of x.ai actually done inference with this model yet? And if s...

zone411 · on March 18, 2024

If you're just looking to test it out, it's probably easiest to wait for llama.cpp to add support (https://github.com/ggerganov/llama.cpp/issues/6120), and then you can run it slowly if you have enough RAM, or wait for one of the inference API providers like together.ai to add it. I'd like to add it to my NYT Connections benchmarks, and that's my plan (though it will require changing the prompt since it's a base model, not a chat/instruct model).

logicchains · on March 18, 2024

>it's probably easiest

Cheapest maybe, but easiest is just to rent a p4de.24xlarge from AWS for a couple hours to test (at around $40/hour..).

zone411 · on March 18, 2024

I'd expect more configuration issues in getting it to run on them than from a tested llama.cpp version, since this doesn't seem like a polished release. But maybe.

v9v · on March 18, 2024

The NYT Connections benchmark sounds interesting, are the results available online?

zone411 · on March 18, 2024

GPT-4 Turbo: 31.0

Claude 3 Opus: 27.3

Mistral Large: 17.7

Mistral Medium: 15.3

Gemini Pro 1.0: 14.2

Qwen 1.5 72B Chat: 10.7

Claude 3 Sonnet: 7.6

GPT-3.5 Turbo: 4.2

Mixtral 8x7B Instruct: 4.2

Llama 2 70B Chat: 3.5

Nous Hermes 2 Yi 34B: 1.5

The interesting part is the large improvement from medium to large models. Existing over-optimized benchmarks don't show this.

- Max is 100. 267 puzzles, 3 prompts for each, uppercase and lowercase

- Partial credit is given if the puzzle is not fully solved

- There is only one attempt allowed per puzzle, 0-shot.

- Humans get 4 attempts and a hint when they are one step away from solving a group

I hoped to get the results of Gemini Advanced, Gemini Pro 1.5, and Grok and do a few-shot version before posting it on GitHub.

stolsvik · on March 21, 2024

Where is this? I googled a bit, and found the game - but using it as a benchmark sounds genious!!

a_wild_dandan · on March 18, 2024

Someone could run Grok-1 on a 192GB M2 Mac when a 4-bit quant is released; I'm guessing that TheBloke is already working on it.

mohu · on March 18, 2024

Fairly sure the bloke hasn't created any new quants in a month.

hanselot · on March 18, 2024

TheBloke dissapeared near the day https://nvd.nist.gov/vuln/detail/CVE-2024-23496 was published.

Of course there has been much speculation on this, I have no more information than this that can be backed up by facts, but the timing was suspicious.

pixelesque · on March 18, 2024

He's started a company in the UK: https://suite.endole.co.uk/insight/company/15361921-thebloke...

Interestingly registered just around the corner from where one of my relatives used to live.

moffkalast · on March 18, 2024

And his grant funding supposedly ran out.

oezi · on March 18, 2024

Was any .gguf file hosted on HuggingFace found to be crafted in a way to exploit this?

d-z-m · on March 18, 2024

what exactly are you implying here?

htrp · on March 18, 2024

Still waiting on this one. Anyone find someone on twitter who can run it?