55 token/s here on m4 pro, turning on flash attention puts it to 60/s.

		mdz4040 6 months ago \| parent \| context \| favorite \| on: Open models by OpenAI 55 token/s here on m4 pro, turning on flash attention puts it to 60/s.