Hacker Newsnew | past | comments | ask | show | jobs | submit | admax88qqq's commentslogin

Web apps kind of already do that with most companies shipping constant UX redesigns, A/B tests, new features, etc.

For a typical user today’s software isn’t particularly deterministic. Auto updates mean your software is constantly changing under you.


I don't think that is what the original commenter was getting at. In your case, the company is actively choosing to make changes. Whether its for a good reason, or leads to a good outcome, is beside the point.

LLMs being inherently non-deterministic means using this technology as the foundation of your UI will mean your UI is also non-deterministic. The changes that stem from that are NOT from any active participation of the authors/providers.

This opens a can of worms where there will always be a potential for the LLM to spit out extremely undesirable changes without anyone knowing. Maybe your bank app one day doesn't let you access your money. This is a danger inherent and fundamental to LLMs.


Right I get tha. The point I’m making is that from a users perspective it’s functionally very similar. A non deterministic llm or a non deterministic company full of designers and engineers.


Regardless of what changes the bank makes, it’s not going to let you access someone else’s money. This llm very well might.


Well, software has been known to have vulnerabilities...

Consider this: the bank teller is non-deterministic, too. They could give you 500 dollars of someone else's money. But they don't, generally.


Bank tellers are deterministic though. They have a set protocol for each cases and escalate unknown cases to a more deterministic point of contact.

It will be difficult to incorporate relative access or restrictions to features with respect to users current/known state or actions. Might as well write the entire web app at that point.


I think the bank teller's systems and processes are deterministic, but the teller itself is not. They could even rob the bank, if they wanted to. They could shoot the customers. They don't, generally, but they can.

I think, if we can efficiently capture a way to "make" LLMs conform to a set of processes, you can cut out the app and just let the LLM do it. I don't think this makes any sense for maybe the next decade, but perhaps at some point it will. And, in such time, software engineering will no longer exist.


The actual app is the set of processes.


The rate of change is so different it seems absurd to compare the two in that way.

The LLM example gives you a completely different UI on _every_ page load.

That’s very different from companies moving around buttons occasionally and rarely doing full redesigns


And most end users hate it.


> You're strictly correct, but the rules for chess are infamously hard to implement

Come on. Yeah they're not trivial but they've been done numerous times. There's been chess programs for almost as long as there have been computers. Checking legal moves is a _solved problem_.

Detecting valid medical advice is not. The two are not even remotely comparable.


> Detecting valid medical advice is not. The two are not even remotely comparable.

Uh? Where exactly did I signal my support for LLM's giving medical advice?


It’s not a questions of being able to reverse. It’s a question of being able to diagnose that one of these changes even was the problem and if so which one.


Record changes in git and then git bisect issues, maybe?

Without change capture, solid regression testing, or observability, it seems difficult to manage these changes. I’d like to how others are managing these kinds of changes to readily troubleshoot them, without lots of regression testing or observability, if anyone has successes to share.


I focused primarily on guesswho's "in ways I am unaware of".

Your issue appears to be true for any system change. Although, risk will of course vary.


If they can be reversed individually you can simply deduce by rolling back changes one by one, no?


Suppose you run a fleet of a thousand machines. They all autotune. They are, lets say, serving cached video, or something.

You notice that your aggregate error rate been drifting upwards since using bpftune. It turns out, in reality, there is some complex interaction between the tuning and your routers, or your TOR switches, or whatever - there is feedback that causes oscillations in a tuned value, swinging between too high and too low.

Can you see how this is not a matter of simple deduction and rollbacks?

This scenario is plausible. Autotuning generally has issues with feedback, since the overall system lacks control theoretic structure. And the premise here is that you use this to tune a large number of machines where individual admin is infeasible.


When you have a thousand machines, you can usually get feedback pretty quick, in my experience.

Run the tune on one machine. Looks good? Put it on ten. Looks good? Put it on one hundred. Looks good? Put it on everyone.

Find an issue a week later, and want to dig into it? Run 100 machines back on the old tune, and 100 machines with half the difference. See what happens.


Presumably one would use autotune to find optimized parameters, and then roll those out via change control, either one parameter at a time, or a mix of parameters across the systems.

Alternatively: if you have a fleet of thousands of machines you can very easily do a binary search with them to a)establish the problem with the auto-tuner and then b)which of the changes it settled on are causing your problems.

I get the impression you've never actually managed a "fleet" of systems, because these techniques would have immediately occurred to you.


Certainly when we managed Twitch’s ~10,000 boxes of video servers, neither of the tasks you describe would have been simple. We underinvested in tools, for sure. Even so, I don’t think you can really argue that dynamically changing configs like this are going to make life easier!


>not only can we observe the system and tune appropriately, we can also observe the effect of that tuning and re-tune if necessary. //

Does sound like a potential way to implement literal chaos.

Surely it's like anything else, you do pre-release testing and balance the benefits for you against the risks?


In that scenario you could run it on a couple servers, compare and contrast, and then apply globally via whatever management tool you use.


Sounds like you have your answer of “don’t use it” then.


Only if you already suspect that this tool caused the problem.


Seeing ads can still affect you psychologically even if you don't click them.

Also lots of ads prey on people with worse impulse control who bankroll the rest of us who don't click ads. Similar to how casinos are bankrolled by the addicts at the slot machines or many games are bankrolled by the addicts spending all their savings on in game items.

Doesn't make me feel warm and fuzzy.

Plus there's something just aesthetically pleasing about an ad-free experience. I started paying for youtube premium to avoid ads and I must say its a much nicer experience.


> Also lots of ads prey on people with worse impulse control who bankroll the rest of us who don't click ads.

This reminds me of the Mark Twain adage of "Telling a man he can't have steak just because a baby can't chew it."

I don't want to pay money & subscriptions to every site I visit because some folks don't have impulse control. Similarly, the prevalence of alcoholism in society shouldn't prevent me from having a glass of wine with dinner.


> I don't want to pay money & subscriptions to every site I visit because some folks don't have impulse control.

You've got it exactly backwards. The reason you don't have to pay subscriptions is because of people with poor impulse control. If ads were less effective (e.g. the low impulse control people didn't exit), more sites would require subscriptions because the ad inventory would not be able to cover costs.


> I don't want to pay money & subscriptions to every site I visit because some folks don't have impulse control.

I have bad news for you: the absolutely infinite capacity for greed and the subsequent enshittification means that you're going to pay a subscription fee and still get to have your brain pickled by ad-based propaganda, just like cable TV.


I find the bus tracking of this app in my city is pretty poor unfortunately. Probably not the apps fault probably an issue of the municipality but still annoying.

That being said, if this app could convince cities to also be used for payment that would be a game changer. Uber for public transit would really remove so much friction from using transit.


They do support payment in some cities (mine included), and it works pretty well.

A bunch of the larger / better-funded systems are also moving to just accepting credit cards directly on the readers, which is even easier.


Some cities let you pay with contactless credit cards (at least London and New York have it; Boston and San Francisco are working on it) which seems like the most elegant solution to this, at least from a commuter’s point of view.


You can pay in the app on the Denver RTD system!


Awesome!


The main difference is that UAC is automatically triggered by the OS and takes over the whole display making it harder to fake/intercept. It’s trivial to put a fake sudo in someones PATH and steal their password


lol UAC is such a lazy shitshow of a security implementation…

A) there is no interception to be had. It’s a fucking “Yes I am Admin” single click a child could do unsupervised.

B) It requires training for the user to know that this is a special UAC mode. That’s high-motivation, high-knowledge user training. Pilots train to recognize unusual signs. Your grandma does not train to recognize what UAC looks like, why it would come up and when. UAC is the biggest cop out of a security excuse and Windows should be ashamed.


Sure I guess, I don't know why UAC gets so much hate while sudo gets so much praise.

UAC is strictly better than sudo IMO.

Does UAC solve security for windows? Of course not, but we were comparing against sudo here.


> lol UAC is such a lazy shitshow of a security implementation…

It's by far the most secure and well thought out implementation of an elevation prompt across all operating systems.

A lot of thought went into designing the Secure Desktop [1] used by UAC, and really mac and linux not having something similar is an embarrassment.

[1] https://learn.microsoft.com/en-us/archive/blogs/uac/user-acc...


I stand corrected, it is not a lazy shitshow.

You’re right, fake sudo prompts is how people get exploited all day long. I’ve witnessed it on MacOS.

For UAC, the user still has to learn that the darkening on the screen and the prompt is “serious business.” I think that when a password is present and has been willfully supplied, prompting the user for the password guards against automatic/accidental acceptance (button-only user confirmation prompts). I understand that many users have a joke password that might as well not be something that’s not really any more secure than a click on a button.

I see that Sudo for Windows has been restricted to Desktop only. https://hudsonvalleyhost.com/blog/microsoft-officially-exclu...

From the design article you linked, I know it’s 2006 era:

> You hide the real mouse cursor and show a fake one some number of pixels offset to the real one

I think MacOS only in the recent years has “Full Desktop Control” as an accessibility-category permission (a confusing category to boot) it enforces on apps to prevent faking the cursor.


> I'm convinced computers are much better at it, but lawyers suffice.

This is just wrong though. The effect of the law is only what humans determine it to be.

Computers can't be better at it by definition. If a computer claims a law says one thing but a judge/court determines the other, the judge wins because the law is a human system.


similar to what the crypto people tried with smart contracts. I can unconditionally have a token that says I own a pizza, but it doesn't mean I own a pizza.


Sure, but a computer may be better than a lawyer at predicting what a judge might say.


Don’t need landing legs/gear on the ship. Saves weight


> Maybe if the product is shit, the color of the buttons doesn't matter.

This should really be on a poster in many offices.


“It looks awful, and it works” (apologies to Buckley's)


They are suggesting that it’s easy to make mistakes when writing the JWT auth code as opposed to just talking to the IdP using TLs


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: