This is awesome! Working on a desktop pet so the buddy caught my attention. Looking forward to making friends with my Rare Duck buddy tomorrow. Wish it was a snarky duck instead of a patient one though.
Do you think LLMs make this process of starting the engine easier or harder? They make getting started much easier, but it might be harder to feel a sense of momentum since our expectations of speed have changed, and the learning moments have changed as well.
The bug is in the software in our heads, if anything. We learned a little too much, that we're thinking further ahead than we would have when we first started out. So you need to purposefully shut off that part of your eval, so that you get started on anything at all.
If you design with the LLM, then it can make this easier by prompting it to help you not talk yourself out of things.
I found that gstack's /office_hours to be good about encouraging, while being firm. I've only done one of the modes, but it didn't dismiss my pushback when it was just based on my intuition. It took it as a baseline, and tried to evaluate it by taking it seriously. If that's any indication, the other modes for side projects should be just as supportive.
I think LLMs can make it easier to be more ambitious. Non-techies are blown away by being able to build web pages? I'm blown away that I was able to root my 1st gen Kindle Fire to repurpose it as a remote terminal to ssh into my laptop to talk to claude code. I've been trying to root the thing for years and could never find the right instructions to make it work.
Author here. The FaceTime is a reference to the J Cole lyric in the footnotes.
I actually dug up a video from a class project when I was 20 and created a voice clone in ElevenLabs (and also gave it current voice samples with prompts to make it sound younger), but hearing it didn’t add anything to my introspective experience, so I stopped before creating a video clone in Tavus/HeyGen/Simli to do a call via Pipecat/Daily.
To the best of my knowledge, Waymo still has humans in the loop as Fleet Response agents that the vehicles can call for remote assistance when they aren't sure what to do. Caveat that the number needed likely isn't on the same order of magnitude as human drivers, but the job is likely higher paying. I could see a scenario where these should be locals for both latency (ChatGPT says SF to Miami RTT latency might be 80-100 ms and I don't believe the humans really teleoperate the vehicles, so that may not be meaningful, but that might be a bigger deal for international expansion) and knowledge of tricky intersections or road quirks in the city. They could also potentially help with labeling quirky city-specific scenarios and other various evals.
Love this idea! The Whiteboard Gym explainer video seemed really text-heavy (although I did learn enough to guess that that's because text likely beat drawing/adding an image for these abstract concepts for the GRPO agent). I found Shraman's personal story video much more engaging! https://x.com/ShramanKar/status/1955404430943326239
I've been trying to eliminate multi-tasking as much as I can, but the nature of startups day-to-day and even what seems like a single/monotask when zoomed out now often involves context switching (For say, investigating and fixing a user-reported bug, I might have to toggle between VSCode, localhost in browser + the DOM inspector or console, our bug tracker, our support ticketing tool, Slack, and sometimes the Cody window in VS Code/ChatGPT/Claude:
RT in pure trials: 448ms
RT in mixed trials: 710ms
Mixing cost: 262ms
RT in task-repeat trials (in mixed blocks): 710ms
RT in task-switching trials (in mixed blocks): 975ms
Task-switch cost: 265ms
RT in pure trials: 490ms
RT in mixed trials: 825ms
Mixing cost: 335ms
RT in task-repeat trials (in mixed blocks): 825ms
RT in task-switching trials (in mixed blocks): 969ms
Task-switch cost: 144ms
Second time (while listening to music --- I decided to do this since I've noticed it somehow decreases my latency in typing tests significantly):
RT in pure trials: 436ms
RT in mixed trials: 673ms
Mixing cost: 237ms
RT in task-repeat trials (in mixed blocks): 673ms
RT in task-switching trials (in mixed blocks): 746ms
Task-switch cost: 73ms
Edit: third time, also while listening to music:
RT in pure trials: 435ms
RT in mixed trials: 608ms
Mixing cost: 173ms
RT in task-repeat trials (in mixed blocks): 608ms
RT in task-switching trials (in mixed blocks): 700ms
Task-switch cost: 92ms
I suspect this "game" is also amenable to practice, and find it at least a bit weirdly addictive in the same way as Flappy Bird.
I don't think this game is all that representative of context-switching overhead, as my 4th attempt gives evidence that this improves quickly with practice:
RT in pure trials: 422ms
RT in mixed trials: 611ms
Mixing cost: 189ms
RT in task-repeat trials (in mixed blocks): 611ms
RT in task-switching trials (in mixed blocks): 602ms
Task-switch cost: -9ms
The "mixed trials" are naturally slower because I'm having to recognise 4 patterns instead of 2, but only by ~50%.
Whoa, a negative task-switch cost! I didn't take it multiple times, but it makes sense that having practice at this specific task probably improves both your overall response times and maybe more specifically improves the different trials.
What I'm curious about is whether we also get specifically good at say, task-switching between a code editor and say, Stack Overflow, over time
RT in pure trials: 463ms
RT in mixed trials: 833ms
Mixing cost: 369ms
RT in task-repeat trials (in mixed blocks): 833ms
RT in task-switching trials (in mixed blocks): 1040ms
Task-switch cost: 207ms
Awesome project! Reminds me of donothingfor2minutes.com from Calm, but with a different end goal of focus instead of calm.
Regarding mobile phones going to sleep, Wake lock [1] might help, unless you can reduce to 59s since I believe 1m is the threshold (make sure to request within the context of the user hitting "start"). Unfortunately on older mobile browsers [2], the best workaround I found was using this NoSleep library[3].
We thought about it but ultimately decided that owning a truck would detract from what we really needed to be doing which was getting truck owners to do the luggs
reply