If I understand correctly, I think the collectibles market is more in line with what GameStop is looking at here. They recently got into the trading card game including grading services via PSA.
Yes, so much so that cards that were sold at retail in 2024 after grading sold went from around $100 in cost to well over $1000 in 18 months, and this was me making the market. The prices have since 2.5x-ed on the same card (2024 Topps Chrome Sapphire Base #500 PSA 10). It's correcting a little, but a 10x rise on a card that is effectively not considered limited edition and most had placed in storage suddenly 10x and then 2.5x is quite rare, especially since it's a new card.
These are just public sales. Private deals are done with agents on both sides routinely and without any reportage. There's an element of gambling to most transactions but on the origination side, mostly because Topps, who owns licenses to the major sports leagues, are neither timely nor accurate in posting pack configuration odds, and seems to somehow have nobody competent enough to properly ensure that the same cards don't all get clustered in the same box. On multiple occasions I've bought cases where 3 out of 10 cards of a player were pulled, and multiple 2/10s. The checklist is only 100 cards. The case had 384 cards total. It's downright negligent, but screw the consumers, right? Thanks, Lina Khan, for making it all happen.
There's money to be made but it's a lot of dumb money mixed in with some very sharp acquisitions. Who knows how it'll play out. The market is inefficient largely because USPS is effectively a crapshoot in a time-sensitive market. The likes of Courtyard.io have only partially caught on, and ArenaClub, their competitor, ran for 2 years where a bookmarkelet allowed the user to turn what was supposed to be a random draw into a completely predictable purchase at way below market. Upon reporting, they just added a line in their ToS that put users in theory on notice. They did not fix the bug. They don't even have a SECURITY.md. The company served so much unnecessary data on their API that I now have Steve Nash's personal cell number, among others, before they designed their front page.
There's a gold rush going on but this really should be a hedge. At some point the market correction will screw over a ton of people.
It's basically an offshoot of the same appeal of crypto/NFTs but you get something to look at, I guess, and the grading companies make good money off of it.
A quick google says 320 billion in 2025 and is projected to grow to over $535 billion by 2033. I didn't know it was that big but it makes sense. Gamestop has been all in in collectibles and eBay has a huge market on it as well. I think this is the play. Both companies being profitable doesn't make it a bad deal for the number one collectibles company in the world.
I have followed from side and it feels like NFT craze hot. With some parts like Pokemon cards being insanity with regular fights, people hiding in stores and so on.
It is a multi billion dollar market with Ebay being key secondary market with Gamestop angling for same.
This is absolutely possible but likely not desirable for a large enough population of customers such that current LLM inference providers don't offer it. You can get closer by lowering a variable, temperature. This is typically a floating point number 0-1 or 0-2. The lower this number, the less noise in responses, but a 0 still does not result in identical responses due to other variability.
In response to the idea of iterative development, it is still possible, actually! You run something more akin to integration tests and measure the output against either deterministic processes or have an LLM judge it's own output. These are called evals and in my experience are a pretty hard requirement to trusting deployed AI.
So, you would perhaps ask AI to write a set of unit-tests, and then to create the implementation, then ask the AI to evaluate that implementation against the unit-tests it wrote. Right? But then again the unit-tests now, might be completetly different from the previous unit-tests? Right?
Or would it help if a different LLM wrote the unit-tests than the one writing the implementation? Or, should the unit-tests perhaps be in an .md file?
I also have a question about using .md files with AI: Why .md, why not .txt?
Not quite unit tests. Evals should be created by humans, as they are measuring quality of the solution.
Let's take the example of the GitHub pr slack bot from the blog post. I would expect 2-3 evals out of that.
Starting at the core, the first eval could be that, given a list of slack messages, it correctly identifies the PRs and calls the correct tool to look up the status of said PR. None of this has to be real and the tool doesn't have to be called, but we can write a test, much like a unit test, that confirms that the AI is responding correctly in that instance.
Next, we can setup another scenario for the AI using effectively mocked history that shows what happens when the AI finds slack messages with open PRs, slack messages with merged PRs and no PR links and determine again, does the AI try to add the correct reaction given our expectations.
These are both deterministic or code-based evals that you could use to iterate on your solutions.
The use for an LLM-as-a-Judge eval is more nuanced and usually there to measure subjective results. Things like: did the LLM make assumptions not present in the context window (hallucinate) or did it respond with something completely out of context? These should be simple yes or no questions that would be easy for a human but hard to code up a deterministic test case.
Once you have your evals defined, you can begin running these with some regularity and you're to a point where you can iterate on your prompts with a higher level of confidence than vibes
Edit: I did want to share that if you can make something deterministic, you probably should. The slack PR example is something that id just make a simple script that runs on a cron schedule, but it was easy to pull on as an example.
Oh goodie. My 6a has had terrible battery life (and actually overheated about a month and a half ago while charging) yet my phone doesn't qualify for the replacement for whatever reason.
This is my 3rd google phone in a row with issues:
Nexus 6p - Had to have the battery replaced and then inexplicitly died, dead to the world
Pixel 4a - Similar battery issues + a screen that physically fell out of the phone
Pixel 6a - Battery woes BEFORE this update in the near future
I had a 5A inexplicably develop a crack in the screen literally before my eyes as I was holding it. I watched it propagate from left to right over about 6 seconds. If I hadn't seen it, I would never have believed it.
Not even worth arguing for a warranty at that point. Replaced the screen myself later.
The 5A had a serious motherboard defect that caused phones to become unusable. Google had a device replacement program that allowed owners to exchange dead 5As for newer models.
I know you used the /s but it's quite common that 0 temperature is believed to be deterministic. For others coming across this thread, it's not deterministic, it is simply less likely to return different tokens (it still absolutely will)