You don't give it your "prod email", you give it a secondary email you created specifically for it.
You don't give it your "prod Paypal", you create a secondary paypal (perhaps a paypal account registered using the same email as the secondary email you gave it).
You don't give it your "prod bank checking account", you spin up a new checking with Discover.com (or any other online back that takes <5min to create a new checking account). With online banking it is fairly straightforward to set up fully-sandboxed financial accounts. You can, for example, set up one-way flows from your "prod checking account" to your "bastion checking account." Where prod can push/pull cash to the bastion checking, but the bastion cannot push/pull (or even see) the prod checking acct. The "permissions" logic that supports this is handled by the Nacha network (which governs how ACH transfers can flow). Banks cannot... ignore the permissions... they quickly (immediately) lose their ability to legally operate as a bank if they do...
Now then, I'm not trying to handwave away the serious challenges associated with this technology. There's also the threat of reputational risks etc since it is operating as your agent -- heck potentially even legal risk if things get into the realm of "oops this thing accidentally committed financial fraud."
I'm simply saying that the idea of least privileged permissions applies to online accounts as well as everything else.
isn't the value proposition "it can read your email and then automatically do things"? if it can't read your email and then can't actually automatically do things... what's the point?
Yes -- definitely that's the value prop. But it's not binary all or nothing.
AI automation is about trust (honestly, same as human delegation).
You give it access to a little bit of data, just enough to do a basic useful thing or two, then you give it a bit of responsibility.
Then as you build confidence and trust, you give it a little more access, and allow it to take on a little more responsibility. Naturally, if it blows up in your face, you dial back access and responsibility quick.
As an analogy, folks drive their cars on the highway at 65-85+ MPH. Fatality rate goes up somewhat exponentially with speed and anything 60+ is considerably more deadly than ~30mph.
We're all so confident that a wheel won't randomly fall off because we've built so much trust with the quality of modern automobiles. But it does happen (I had a friend in high-school who's wheel popped off on a 45 mph road -- naturally he was going 50-55 IIRC).
In the early 1900s people would have thought you had a death wish to drive this fast. 25-30mph was normal then -- the automobiles at the time just weren't developed enough to be trusted at higher speeds.
My previous comment was about the fact that it is possible to build this sandboxing/bastion layer with live web accounts that allows for fine grained control over how much data you want to expose to the ai.
The value proposition is it is an agent with (some) memory. There are lots of use cases that don't involve giving access to your personal stuff. Even a simple "Monitor these companies' career pages and notify me of an opening in my city" is useful.
Hi Kypro this is very interesting perspective. Can you reach out to me? I'd like to discuss what you're observing with you a bit in private as it relates heavily to a project I'm currently working on. My contact info is on my profile. Pls shoot me a connection request and just say you're kypro from HN :)
Or is there a good way for me to contact you? Your profile doesn't list anything and your handle doesn't seem to have much of an online footprint.
Lastly, I promise I'm not some weirdo, I'm a realperson™ -- just check my HN comment history. A lot of people in the AI community have met me in person and can confirm (swyx etc).
Hi Swyx I always appreciate your insights, something you wrote really resonated with a personal theory I've been developing:
>"While I never use AI for personal writing (because I have a strong belief in writing to think)"
The optimal AI productivity process is starting to look like:
AI Generates > Human Validates > Loop
Yet cognitive generation is how humans learn and develop cognitive strength, as well as how they maintain such strength.
Similar to how physical activity is how muscles/bone density/etc grow, and how body tissues maintain.
Physical technology freed us from hard physical labor that kept our bodies in shape -- at a cost of physical atrophy.
AI seems to have a similar effect for our minds. AI will accelerate our cognitive productivity, and allow for cognitive convenience -- at a cost of cognitive atrophy.
At present we must be intentional about building/maintaining physical strength (dedicated strength training, cardio, etc).
Soon we will need to be intentional about building/maintaining cognitive strength.
I suspect the workday/week of the future will be split on AI-on-a-leash work for optimal productivity, with carve-outs for dedicated AI-enhanced-learning solely for building/maintaining cognitive health (where productivity is not the goal, building/maintaining cognition is). Similar to how we carve out time for working out.
What are your thoughts on this? Based on what you wrote above, it seems you have similar feelings?
Granted, that article refers to retrieval specifically being one major way we learn, and of course learning incorporates many dimensions. But it seems a bit self-evident that retrieval occurs heavily during active problem solving (ie "generation"), and less so during passive learning (ie: just reading/consuming info).
From personal experience, I always noticed I learned much more by doing than by consuming documentation alone.
But yes, I admit this assumption and my own personal experience/bias is doing a lot of heavy lifting for me...
2) Regarding the "optimal AI productivity process" (AI Generates > Human Validates > Loop)
I'm using Karpathy's productivity loop described in his AI startup school talk last month here:
Does this help make it more concrete Swyx (name dropping you here since I'm pretty sure you've got a social listener set for your handle ;)? Love to hear your thoughts straight from the hip based on your own personal experiences.
Full disclosure: I'm not trying to get too academic about this. In all honestly I'm really trying to get to an informal theory that's useful and practical enough that it can be turned into a regular business process for rapid professional development.
Search tool calling is RAG. Maybe we should call it a "RAG Agent" to be more en vogue heh. But RAG is not just similarity search on embeddings in vector DBs. RAG is any type of a retrieval + context injection step prior to inference.
Heck, the RAG Agent could run cosign diff on your vector db in addition to grep, FTS queries, KB api calls, whatever, to do wide recall (candidate generation) then rerank (relevance prioritization) all the results.
You are probably correct that for most use cases search tool calling makes more practical sense than embeddings similarity search to power RAG.
I built a distributed software engineering firm pre-covid, so all of our clients were onsite even though we were full-remote. My engineers plugged into the engineering teams of our clients, so it's not like we were building on the side and just handing over deliverables, we had to fully integrate into the client teams.
So we had to solve this problem pre-covid, and the solution remained the same during the pandemic when every org went full remote (at least temporarily).
There is no "one size fits all approach" because each engineer is different. We had dozens of engineers on our team, and you learn that people are very diverse in how they think/operate.
But we came up with a framework that was really successful.
1) Good faith is required: you mention personnel abusing time/trust, that's a different issue entirely, no framework will be successful if people refuse to comply. This system only works if teammates trust the person. Terminate someone who can't be trusted.
2) "Know thyself": Many engineers wouldn't necessarily even know how THEY operated best (if they needed large chunks of focus time, or were fine multi-tasking, etc). We'd have them make a best guess when onboarding and then iterate and update as they figured out how they worked best.
3) Proactively Propagate Communication Standard: Most engineers would want large chunks of uninterrupted focus time, so we would tell them to EXPLICITLY tell their teammates or any other stakeholders WHEN they would be focusing and unresponsive (standardize it via schedule), and WHY (ie sell the idea). Bad feelings or optics are ALWAYS simply a matter of miscommunication so long as good faith exists. We'd also have them explain "escalation patterns", ie "if something is truly urgent, DM me on slack a few times and finally, call my phone."
4) Set comms status: Really this is just slack/teams. but basically as a soft reminder to stakeholders, set your slack status to "heads down building" or something so people remember that you aren't available due to focus time. It's really easy to sync slack status to calendar blocks to automate this.
We also found that breaking the day into async task time and sync task time really helped optimize. Async tasks are tasks that can get completed in small chunks of time like code review, checking email, slack, etc. These might be large time sinks in aggregate, but generally you can break into small time blocks and still be successful. We would have people set up their day so all the async tasks would be done when they are already paying a context switching cost. IE, scheduled agile cadence meetings etc. If you're doing a standup meeting, you're already gonna be knocked out of flow so might as well use this time to also do PR review, async comms, etc. Naturally we had people stack their meetings when possible instead of pepper throughout the day (more on how this was accomplished below).
Anyways, sometimes when an engineer of ours joined a new team, there might be a political challenge in not fitting into the existing "mold" of how that team communicated (if that team's comm standard didn't jive with our engineer's). This quickly resolved every single time when our engineer was proven out to be much more productive/effective than the existing engineers (who were kneecapped by the terrible distracting existing standard of meetings, constant slack interruptions, etc). We would even go as far as to tell stakeholders our engineers would not be attending less important meetings (not immediately, once we had already proven ourselves a bit). The optics around this weren't great at first, but again, our engineers would start 1.5-2X'ing productivity of the in-house engineers, and political issues melt away very quickly.
TL;DR - Operate in good faith, decide your own best communication standard, propagate the standard out to your stakeholders explicitly, deliver and people will respect you and also your comms standard.
I've been lucky enough to have a few conversations with Scott a month or so ago and he is doing some really compelling work around the AISDLC and creating a factory line approach to building software. Seriously folks, I recommend following this guy closely.
There's another guy in this space I know who's doing similar incredible things but he doesn't really speak about it publicly so don't want to discuss w/o his permission. I'm happy to make an introduction for those interested just hmu (check my profile for how).
For those who don't, reading "Competing Against Luck" by Clayton Christensen will dramatically improve your ability to create successful products/services.
Hi Paul, been following the aider project for about a year now to develop an understanding of how to build SWE agents.
I was at the AI Engineering Summit in NYC last week and met an (extremely senior) staff ai engineer doing somewhat unbelievable things with aider. Shocking things tbh.
Is there a good way to share stories about real-world aider projects like this with you directly (if I can get approval from him)? Not sure posting on public forum is appropriate but I think you would be really interested to hear how people are using this tool at the edge.
Hey! this is exciting to see! Hi Earl and Oisin! (I've had the pleasure of meeting Earl and Oisin face to face a few times. Really friendly and smart guys, fwiw based on my convos they are very serious about building a compelling product, excited to see it on hn!)
You don't give it your "prod email", you give it a secondary email you created specifically for it.
You don't give it your "prod Paypal", you create a secondary paypal (perhaps a paypal account registered using the same email as the secondary email you gave it).
You don't give it your "prod bank checking account", you spin up a new checking with Discover.com (or any other online back that takes <5min to create a new checking account). With online banking it is fairly straightforward to set up fully-sandboxed financial accounts. You can, for example, set up one-way flows from your "prod checking account" to your "bastion checking account." Where prod can push/pull cash to the bastion checking, but the bastion cannot push/pull (or even see) the prod checking acct. The "permissions" logic that supports this is handled by the Nacha network (which governs how ACH transfers can flow). Banks cannot... ignore the permissions... they quickly (immediately) lose their ability to legally operate as a bank if they do...
Now then, I'm not trying to handwave away the serious challenges associated with this technology. There's also the threat of reputational risks etc since it is operating as your agent -- heck potentially even legal risk if things get into the realm of "oops this thing accidentally committed financial fraud."
I'm simply saying that the idea of least privileged permissions applies to online accounts as well as everything else.
reply