It's actually even cheaper when you look at the cache read costs. Those costs ca...

freakynit · 2026-05-23T06:57:36 1779519456

Also, deepseek cache hit rates are pretty good. I use deepseek v4 flash model regularly for agentic tasks (more than 20 tool calls on average per run), and 70%+ of input tokens get served from cache.

The speed is absolutely bonkers too. I once misconfigured a mcp I was developing locally, and told it to use the tools provided by this mcp to get certain task done. It figured out that the mcp is misconfigured, and then automatically went ahead and started to fix the mcp, fixed it, and then started using it by passing raw jsonrpc messages using stdin/out, bypassing the harness integration (since it would have needed a restart).

It did all of this in under 30 seconds and made over 15 tool calls in all of this (yes, I use yolo mode in a container, so my agents have full access to everything in the container).