Hackers by Steven Levy is an incredible story of the industry’s early years (60-80’s) and the characters that were in it for the “love of the game” vs what is more common now (“status and money”). A lot of heroes like woz, but who are less well known in this day and age (Gosper and Greenblatt!). If you are familiar with and a fan of Dealers of Lightning or Dream Machine, check out Hackers! (this is not a paid endorsement).
We do the token counting on our end literally just running tiktoken on the content chunks (although I think usually its one token per chunk). Its a bit annoying and I too expected they'd have the usage block but its one line of code if you already have tiktoken available. I've found the accounting on my side lines up well with what we see on our usage dashboard.
As an FYI, this is fine for rough usage, but it's not accurate. The OpenAI APIs inject various tokens you are unaware of into the input for things like function calling.
I struggled to get an intuition for this, but on another HN thread earlier this year saw the recommendation for Sebastian Raschka's series. Starting with this video: https://www.youtube.com/watch?v=mDZil99CtSU and maybe the next three or four. It was really helpful to get a sense of the original 2014 concept of attention which is easier to understand but less powerful (https://arxiv.org/abs/1409.0473), and then how it gets powerful with the more modern notion of attention. So if you have a reasonable intuition for "regular" ANNs I think this is a great place to start.
+1 you beat me to the punch! I think its helpful to start with simple RL and ignore the "deep" part to get the basics. The first several lectures in this series do that well. It helped me build a simple "cat and mouse" RL simulation https://github.com/gtoubassi/SimpleReinforcementLearning and ultimately a reproduction of the DQN atari game playing agent: https://github.com/gtoubassi/dqn-atari.
Token counting is importing when you are injecting fetched data into the prompt to make sure you don't overflow the prompt size (e.g. in retrieval augmented generation). You want to give the LLM as many facts as will fit in the prompt to improve the quality of its response. So even with billions of dollars... token counting is a thing.