More

sethkim · 2026-03-23T16:44:21 1774284261

This is extremely true. In fact, from what we see many/most of the problems to be solved with LLMs do not have ground-truth values; even hand-labeled data tends to be mostly subjective.

sethkim · 2026-03-23T16:13:21 1774282401

We build a product that's somewhat similar in spirit to DSPy, but people come to us for different reasons than the OP listed here.

1) It's slow: you first have to get acquainted with DSPY and then get hand-labeled data for prompt optimization. This can be a slow process so it's important to just label cases that are ambiguous, not obvious.

2) They know that manual prompt engineering is brittle, and want a prompt that's optimized and robust against a model they're invoking, which DSPy offers. However, it's really the optimizer (ex. GEPA) doing the heavy-lifting.

3) They don't actually want a model or prompt at all. They want a task completed, reliably, and they want that task to not regress in performance. Ideally, the task keeps improving in production.

Curious if folks in this thread feel more of these pains than the ones in the article.

sbpayne · 2026-03-23T16:14:28 1774282468

I think in some sense, this is the real thing everyone wants. Everything else is kind of an implementation detail! Would be really curious to see what you're building!

sethkim · 2026-03-23T16:15:34 1774282534

Feel free to shoot me a note at seth@sutro.sh if you want to check it out!

sethkim · 2025-10-20T22:04:44 1760997884

Under-discussed superpower of LLMs is open-set labeling, which I sort of consider to be inverse classification. Instead of using a static set of pre-determined labels, you're using the LLM to find the semantic clusters within a corpus of unstructured data. It feels like "data mining" in the truest sense.

frenchmajesty · 2025-10-20T22:33:31 1760999611

OP here. This is exactly right! You perfectly encapsulated the idea I stumbled up so beautifully.

alansaber · 2025-10-21T11:56:41 1761047801

problem is these dont bin properly

sethkim · 2025-09-26T19:12:22 1758913942

The models you called out at the beginning were all released this year. What do you think is the difference between this generation of models and previous ones?

sethkim · 2025-07-03T18:53:03 1751568783

Yes! Both Llama 3 and Gemma 3 have 128k context windows.

ryao · 2025-07-03T20:03:44 1751573024

Llama 3 had a 8192 token context window. Llama 3.1 increased it to 131072.

sethkim · 2025-07-03T18:50:04 1751568604

Yes, we're a startup! And LLM inference is a major component of what we do - more importantly, we're working on making these models accessible as analytical processing tools, so we have a strong focus on making them cost-effective at scale.

sharkjacobs · 2025-07-03T20:16:30 1751573790

I see your prices page lists the average cost per million tokens. Is that because you are using the formula you describe, which depends on hardware time and throughput?

> API Price ≈ (Hourly Hardware Cost / Throughput in Tokens per Hour) + Margin

sethkim · 2025-07-03T18:47:18 1751568438

My two cents here is the classic answer - it depends. If you need general "reasoning" capabilities, I see this being a strong possibility. If you need specific, factual information baked into the weights themselves, you'll need something large enough to store that data.

I think the best of both worlds is a sufficiently capable reasoning model with access to external tools and data that can perform CPU-based lookups for information that it doesn't possess.

sethkim · 2025-07-03T18:29:54 1751567394

Both great points, but more or less speak to the same root cause - customer usage patterns are becoming more of a driver for pricing than underlying technology improvements. If so, we likely have hit a "soft" floor for now on pricing. Do you not see it this way?

simonw · 2025-07-03T18:34:01 1751567641

Even given how much prices have decreased over the past 3 years I think there's still room for them to keep going down. I expect there remain a whole lot of optimizations that have not yet been discovered, in both software and hardware.

That 80% drop in o3 was only a few weeks ago!

sethkim · 2025-07-03T18:40:08 1751568008

No doubt prices will continue to drop! We just don't think it will be anything like the orders-of-magnitude YoY improvements we're used to seeing. Consequently, developers shouldn't expect the cost of building and scaling AI applications to be anything close to "free" in the near future as many suspect.

vfvthunter · 2025-07-03T18:40:48 1751568048

I do not see it this way. Google is a publicly traded company responsible for creating value for their shareholders. When they became dicks about ad blockers on youtube last year or so, was it because they hit a bandwidth Moore's law? No. It was a money grab.

ChatGPT is simply what Google should've been 5-7 years ago, but Google was more interested in presenting me with ads to click on instead of helping me find what I was looking for. ChatGPT is at least 50% of my searches now. And they're losing revenue because of that.

sethkim · 2025-06-17T19:29:38 1750188578

I run a batch inference/LLM data processing service and we do a lot of work around cost and performance profiling of (open-weight) models.

One odd disconnect that still exists in LLM pricing is the fact that providers charge linearly with respect to token consumption, but costs are actually quadratic with an increase in sequence length.

At this point, since a lot of models have converged around the same model architecture, inference algorithms, and hardware - the chosen costs are likely due to a historical, statistical analysis of the shape of customer requests. In other words, I'm not surprised to see costs increase as providers gather more data about real-world user consumption patterns.

diziet · 2025-06-18T17:02:05 1750266125

Aren't advances in KV caching making compute cost not quite quadratic?

sethkim · 2025-06-03T00:00:17 1748908817

Sutro.sh (fka Skysight) | Infrastructure/LLMs & Research Engineering | SF Bay Area | Full-time

We are building batch inference infrastructure and a great/user developer experience around it. We believe LLMs have not yet been meaningfully unlocked as data processing tools - we're changing that.

Our work involves interesting distributed systems and LLM research problems, newly-imagined user experiences, and a meaningful focus on mission and values.

Open Roles:

Infrastructure/LLM Engineer — https://jobs.skysight.inc/Member-of-Technical-Staff-Infrastr...

Research Engineer - https://jobs.skysight.inc/Member-of-Technical-Staff-Research...

If you're interested in applying, please send an email to jobs@sutro.sh with a resume/LinkedIn Profile. For extra priority, please include [HN] in the subject line.