Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Opus 4(.1) is so expensive[1]. Even Sonnet[2] costs me $5 per hour (basically) using OpenRouter + Codename Goose[3]. The crazy thing is Sonnet 3.5 costs the same thing[4] right now. Gemini Flash is more reasonable[5], but always seems to make the wrong decisions in the end, spinning in circles. OpenAI is better, but still falls short of Claude's performance. Claude also gives back 400's from its API if you CTRL-C in the middle though, so that's annoying.

Economics is important. Best bang for the buck seems to be OpenAI ChatGPT 4.1 mini[6]. Does a decent job, doesn't flood my context window with useless tokens like Claude does, API works every time. Gets me out of bad spots. Can get confused, but I've been able to muddle through with it.

1: https://openrouter.ai/anthropic/claude-opus-4.1

2: https://openrouter.ai/anthropic/claude-sonnet-4

3: https://block.github.io/goose/

4: https://openrouter.ai/anthropic/claude-3.5-sonnet

5: https://openrouter.ai/google/gemini-2.5-flash

6: https://openrouter.ai/openai/gpt-4.1-mini



Get a subscription and use claude code - that's how you get actual reasonable economics out of it. I use claude code all day on the max subscription and maybe twice in the last two weeks have I actually hit usage limits.


> Get a subscription and use claude code

I find the token/credit restrictions on Opus to be near useless even when using Claude Code. I only ever switch to it so get another model's take on the issue. Five minutes of use and I have hit the limit.


Is it a max subscription?

We have the $200 plans for work and despite only using Opus, we rarely hit the limits. CCUsage suggests the same via API would have been ~$2000 over the last month (we work 5 hours a day, 4 days a week, almost always with Claude).


Are you part time?


In a way. Those are my company's working hours.


Yup. Getting to try three or so prompts that it messes up and then running out of quota for hours is entirely useless to me.


It seems for Opus the Max plan is almost always needed for being useful


Is it considerably more cost effective than cline+sonnet api calls with caching and diff edits?

Same context length and throughput limits?

Anecdotally I find gpt4.1 (and mini) were pretty good at those agentic programming tasks but the lack of token caching made the costs blow up with long context.


If you use Claude Code with a subscription and run `ccusage` [0] you can get an idea of your "true usage" and cost.

[0] https://github.com/ryoppippi/ccusage


I'm on the basic $20/mo sub and only ran into token cap limitations in the first few days of using Claude Code (now 2-3 weeks in) before I started being more aggressive about clearing the context. Long contexts will eat up tokens caps quickly when you are having extended back-and-forth conversations with the model. Otherwise, it's been effectively "unlimited" for my own use.


YMMV I'm using the $100/mo max subscription and I hit the limit during a focused coding session where I'm giving it prompts non-stop.

Unfortunately there's no easy tool to inspect usage. I started a project to parse the Claude logs using Claude and generate a Chrome trace with it. It's promising but it was taking my tokens away from my core project.


Check out ccusage, it sounds like the tool you’re describing: https://github.com/ryoppippi/ccusage


That's neat. According to the tool I'm consuming ~300m tokens per day coding with a (retail?) cost of ~125$/day. The output of the model is definitely worth $100/mo to me.


This is a good bar to know. I see the warnings but not sure how much I really have left.

Do you mostly use opus?


Mostly sonnet because of the usage limits.


Makes sense. Seeing some posts about how the system can slow down or decrease in quality at certain times of day.


Neat tool thanks!


ccusage on GitHub.


Yes, it’s much better.

It uses way less tokens or much more effectively when running locally.


Is there any documentation on what the max sub usage limit is? A coworker tried it and was booted off Opus within just a couple hours due to "high usage". I haven't made the jump since I expect my $3k/mo on API would just instantly fly by a $200/mo sub and then I'd just be back on API again, but if it could carve out $1k-2k of costs for a little bit of time managing sub(s) it might be worth it.


It's not documented - that's the whole point. They can scale it back and forth opaquely, letting the high volume users get more usage whenever the low-volume users aren't using it much. If it's explicit and transparent, you don't get the benefit of that, since it would be gamed by unscrupulous power users.

Also there's a cli argument that lets you specify the model. try `claude --help`.


Is there a way to sign up for Claude code that doesn't involve verifying a phone number with Anthropic? They don't even accept Google Voice numbers.

Maybe I'm out of touch, but I'm not handing out my phone number to sign up for random SaaS tools.


It's maybe the leading subscription based tool in our field, not a random SaaS tool.


They have zero need for a phone number.


There are a lot of fraudsters out there who will happily create thousands of accounts with valid CCs that will fail on first actual charge.[0]

I wouldn't be surprised if asking for a phone number lowers the fraud rate enough to compensate for the added friction.

[0] Incidentally, this is also why many AI API providers ask for your money upfront (buy credits) unless you're big enough and/or have existing relationship with them.


Sounds like a trivial fix even for monthly billing, just bill at the start of the month not at the end.


Come on now. You're about to run their cli and let it send any random file on your machine to their API intentionally. Trust them a little.


Sure, no contest on that. They still don't need my phone number.


use a burner


That's fine if you use it for private use. Doesn't work if you're building a product using Claude.


In every price comparison I make. Claude (API) always comes out cheapest if you manage to keep most of your context cached. 90% price reduction for input is crazy.


Cached prices: $.31 for Gemini Pro / Mtok, $1.50 for claude opus 4.1 / Mtok

There's additional storage costs with google caching, around $3.75 for 5 minutes/Mtok, and Claude Opus is $3.75 for 5minute Cache Writes / Mtok.

For cached reads Gemini Pro is 5X cheaper than Opus and like $0.01 more than Sonnet.


Well, it's expensive compared to other models. But it's often much cheaper than human labor.

E.g. if need a self-contained script to do some data processing, for example, Opus can often do that in one shot. 500 line Python script would cost around $1, and as long as it's not tricky it just works - you don't need back-and-forth.

I don't think it's possible to employ any human to make 500 line Python script for $1 (unless it's a free intern or a student), let alone do it in one minute.

Of course, if you use LLM interactively, for many small tasks, Opus might be too expensive, and you probably want a faster model anyway. Really depends on how you use it.

(You can do quite a lot in file-at-once mode. E.g. Gemini 2.5 Flash could write 35 KB of code of a full ML experiment in Python - self-contained with data loading, model setup training, evaluation, all in one file, pretty much on the first try.)


Large models are for querying the model

Small models are for querying the context

Opus is cheap if you use it for its niche


> Large models are for querying the model

> Small models are for querying the context

I respectfully disagree.

My experience is that large models are capable of understanding large contexts much better. Of course they are more expensive and slower, too. But in terms of accuracy, large models are always better at querying the context.


GLM 4.5 / Kimi K2 / Qwen Coder 3 / Gemini Pro 2.5




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: