I also have a a scratch-my-own-itch project[1] that leverages an LLM as a core part of its workload. But it's so niche I could never justify opening it up to general use. (I haven't even deployed it to the web because it's easier to just run it locally since I'm the only user.)
But it got me interested in a topic I have been calling "token economization." I'm sure there's a more common term from it but I'm a newb to this tech. Basically, how to optimize the "run rate" for token utilization per request down.
Have you taken a stab at anything along this vein? Like prompt optimization, and so on? Or are you just letting 'er rip and managing costs by reducing request volume? (Now that I've typed this comment out I realize there is so much I don't know about basic stuff with commercial LLM billing and so on.)
I haven't done any token/cost optimization so far because a) the app works well-enough for me, personally; b) I need more data to understand the areas to optimize.
Most likely, I'd start with quality optimizations that matter to users. Things to make people happier with the results.
Yeah, it's one of those things that is hard to catch unless you've been bit by it before and know to look for it. Analytics teams at scale are at a much higher risk of this sneaking in, which is where automatic blocking with Lexega is helpful. No one wants to have to explain to their leadership why their dashboards were wrong from such a subtle SQL bug months down the road.
Looping back here - trial licenses can now be obtained instantly through the free trial form on the website with just an email. No outreach needed on your part. Here for support if you decide to try it.
My work-issued dev device is a Surface Pro 10. I can't use WSL2 for various regulatory reasons. I will never, ever work on software like this again. Worst development experience of my life because of what a miserable dev env windows is.
I know that's been a meme since forever, but my first hand experience supports it to the extreme.
And yet, studies show that journaling is super effective at helping to sort out your issues. Apparently in one study, journaling was rated as effective than 70% of counselling sessions by participants. I don’t need my journal to understand anything about my internal, subjective experience. That’s my job.
Talking to a friend can be great for your mental health if your friend keeps the attention on you, asks leading questions, and reflects back what you say from time to time. ChatGPT is great at that if you prompt it right. Not as good as a skilled therapist, but good therapists and expensive and in short supply. ChatGPT is way better than nothing.
I think a lot of it comes down to promoting though. I’m untrained, but I’ve both had amazing therapists and I’ve filled that role for years in many social groups. I know what I want chatgpt to ask me when we talk about this stuff. It’s pretty good at following directions. But I bet you’d have a way worse experience if you don’t know what you need.
But it got me interested in a topic I have been calling "token economization." I'm sure there's a more common term from it but I'm a newb to this tech. Basically, how to optimize the "run rate" for token utilization per request down.
Have you taken a stab at anything along this vein? Like prompt optimization, and so on? Or are you just letting 'er rip and managing costs by reducing request volume? (Now that I've typed this comment out I realize there is so much I don't know about basic stuff with commercial LLM billing and so on.)
[1] https://github.com/mattdeboard/itzuli-stanza-mcp
edit:
I asked Claude to educate me about the concepts I'm nibbling at in this comment. After some back-and-forth about how to fetch this link (??), it spit out a useful answer https://claude.ai/share/0359f6a1-1e4f-4ff9-968a-6677ed3e4d14
reply