I feel like I’m good at understanding context. I’ve been working in AI startups over the last 2 years. Currently at an AI search startup.
Managing context for info retrieval is the name of the game.
But for my personal use as a developer, they’ve caused me much headache.
Answers that are subtly wrong in such a way that it took me a week to realize my initial assumption based on the LLM response was totally bunk.
This happened twice. With the yjs library, it gave me half incorrect information that led me to misimplementing the sync protocol. Granted it’s a fairly new library.
And again with the web history api. It said that the history stack only exists until a page reload.
The examples it gave me ran as it described, but that isn’t how the history api works.
I lost a week of time because of that assumption.
I’ve been hesitant to dive back in since then. I ask questions every now and again, but I jump off much faster now if I even think it may be wrong.
There is no substitute for cold hard facts. LLMs do not provide that unless it’s literally the easiest thing for them to do and even then not always.
In the case you were in I would go out of my way to feed the docs to the LLM and then use the LLM to interrogate the docs and then verify the understanding I got from the LLM with a personal reading of the docs that were relevant.
You might think it takes just as long of not longer to do it my way rather than just reading the docs myself. Sometimes it can. But as you get good at the workflow you find that the time sien finding the relevant docs goes down and you get an instant plausible interpretation of the docs added too. You can then very quickly produce application code right away and then docs of the code you write.
- Building front-end prototypes - I use Claude Artifacts for this all the time, if I have an idea for a UI I'll get Claude to spin up an almost instant demo so I can interact with it and see if it feels right. I'll often copy the code out and use it as the starting point for my production feature.
- DSLs like SQL, Bash scripts, jq, AppleScript, grep - I use these WAY more than I used to because 9/10 times Claude gives me exactly what I needed from a single prompt. I built a CLI tool for prompt-driven jq programs recently: https://simonwillison.net/2024/Oct/27/llm-jq/
- Ad-hoc sidequests. This is a pretty broad category, but it's effectively little coding projects which I shouldn't actually be working on at all but I'll let myself get distracted if an LLM can get me there in a few minutes: https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-cas...
- Writing C extensions for SQLite while I'm walking my dog on the beach. I am not a C programmer but I find it extremely entertaining that ChatGPT Code Interpreter, prompted from my phone, can write, compile and test C extension for SQLite for me: https://simonwillison.net/2024/Mar/23/building-c-extensions-...
- That's actually a good example of a general pattern: I use this stuff for exploratory prototyping outside of my usual (Python+JavaScript) stack all the time. Usually this leads nowhere, but occasionally it might turn into a real project (like this AppleScript example: https://til.simonwillison.net/gpt3/chatgpt-applescript )
- Actually writing code. Here's a Python/Django app I wrote almost entirely with Claude: https://simonwillison.net/2024/Aug/8/django-http-debug/ - again, this was something of a side-project - not something worth spending a full day on but worthwhile if I could get it done in a couple of hours.
- Mucking around with APIs. Having a web UI for exploring an API is really useful, and Claude can often knock those out from a single prompt. https://simonwillison.net/2024/Dec/17/openai-webrtc/ is a good example of that.
There's a TON more, but this probably represents the majority of my usage.
Not OP, but I've just gotten really used to verifying implementation details. Yup, those subtle ones really suck. It's pretty much just up to intuition if something in the response (or your followups) rings the `not quite right` bell for you.
I feel like I’m good at understanding context. I’ve been working in AI startups over the last 2 years. Currently at an AI search startup.
Managing context for info retrieval is the name of the game.
But for my personal use as a developer, they’ve caused me much headache.
Answers that are subtly wrong in such a way that it took me a week to realize my initial assumption based on the LLM response was totally bunk.
This happened twice. With the yjs library, it gave me half incorrect information that led me to misimplementing the sync protocol. Granted it’s a fairly new library.
And again with the web history api. It said that the history stack only exists until a page reload. The examples it gave me ran as it described, but that isn’t how the history api works.
I lost a week of time because of that assumption.
I’ve been hesitant to dive back in since then. I ask questions every now and again, but I jump off much faster now if I even think it may be wrong.