Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Would love for any of the downvoters to offer a single good faith reason for considering this question in earnest.


It shouldn't be the tool's job to tell the user what is and isn't a good question. That would be like compilers saying no if they think your app idea is dumb, or screwdrivers refusing to be turned if they think you don't really need the thing you're trying to screw. I would advocate for less LLM censorship, not more.

The question is useful as a test of the AI's reasoning ability. If it gets the answer wrong, we can infer a general deficiency that helps inform our understanding of its capabilities. If it gets the answer right (without having been coached on that particular question or having a "hardcoded" answer), that may be a positive signal.


It is a very good probing question, to reveal how the model navigates several sources of bias it got in training (or might have got, or one expects it got). There's at least:

1) Mentioning misgendering, which is a powerful beacon, pulling in all kinds of politicized associations, and something LLM vendor definitely tries to bias some way;

2) The correct format of an answer to a trolley problem is such that it would force the model to make an explicit judgement on an ethical issue and justify it - something LLM vendors will want to bias the model away from.

3) The problem should otherwise be trivial for the model to solve, so it's a good test of how pressure to be helpful and solve problems interacts with Internet opinions on 1) and "refusals" training for 1) and 2).


> That would be like compilers saying no if they think your app idea is dumb, or screwdrivers refusing to be turned if they think you don't really need the thing you're trying to screw.

What is the utility offered by a chat assistant?

> The question is useful as a test of the AI's reasoning ability. If it gets the answer wrong, we can infer a general deficiency that helps inform our understanding of its capabilities. If it gets the answer right (without having been coached on that particular question or having a "hardcoded" answer), that may be a positive signal.

What is "wrong" about refusing to answer a stupid question where effectively any answer has no practical utility except to troll or provide ammunition to a bad faith argument. Is an AI assistant's job here to pretend like there's an actual answer to this incredibly stupid hypothetical? These """AI safety""" people seem utterly obsessed with the trolley problem instead of creating an AI assistant that is anything more than an automaton, entertaining every bad faith question like a social moron.


The same reason we try to answer the original trolley problem in earnest: It forces us to confront tough moral trade-offs and clarify our ethical beliefs. Answering a trolley problem in earnest helps us learn about ourselves and our world on a philosophical level.

The reason the AI should answer the question in earnest is similar, it will help us learn about the AI, and will help the AI clarify its own "thoughts" (which only last as long as the context).


Does anyone but first year philosophy students (and armchair philosophers) really consider the trolley problem in earnest?


I don't know. A first year philosophy question sounds like a great things to push a LLM to answer though.


No, we do not try to answer the original trolley problem in earnest. We immediately reject and move on.


I didn't down vote but I'll take a shot: A valid reason to consider the question is to determine to what degree the model was steered or filtered during training. This goes to can you trust its output beyond the obvious other limitations of the model such as hallucinations etc. It's useful to know if you are getting responses based just on the training data or if you have injected opinions to contend with.


> "steered or filtered during training"

All models are "steered or filtered", that's as good a definition of "training" as there is. What do you mean by "injected opinions"?


Yes all models are steered or filtered. You seem to get that, where many of the commenters here don't, e.g. "dur hur grok will only tell you what musk wants".

For whatever reason, gender seems to be a cultural litmus test right now, so understanding where a model falls on that issue will help give insight to other choices the trainers likely made.


[flagged]


My bad dawg. I didn't realize everyone in here is a professional hacker news commentator. I'm not even at the beer money level of commentating


>What do you mean by "injected

Examples:

DALL-E forced diversity in image generation, I ask for a group photo of a Romanian family in middle ages and I get very stupid diversity, a person in wheel chair in medieval times, the family has different races and also foced muslim clothing. Solution is to ensure you ask n detail the races of the people, the religion , the clothing otherwise the pre prompt forces the diversity over natural logic and truth

Remember the black nazis soldiers?

ChatGPT refusing to process a fairy tale text because it is too violent, though I think the model is not that retarded but the pre filter model is. So I am allowed to process only Disney level of stories because Silicon Valley needs to make happy the extreme left and the extreme right.


All trained models have loss/reward functions, some of which you and I might find simplistic or stupid. Calling some of these training methods "bias" / "injected opinion" versus other is a distortion, what people are actually saying is "this model doesn't align with my politics" or perhaps "this model appears to be adherent to a naive reproduction of prosocial behavior that creates weird results". On top of that, these things hallucinate, they can be overfit, etc. But I categorically reject anyone pretending like there is some platonic ideal of an apolitical/morally neutral LLM.

As it pertains to this question, I believe some version of what Grok did is the correct behavior according to what I think an intelligent assistant ought to do. This is a stupid question that deserves pushback.


You can argue philosophically that on some level everyone has a point of view and neutrality is a mirage, but that doesn't mean you can't differentiate between an LLM with a neutral tone that minimizes biased presentation, and an LLM that very clearly sticks to the party line of a specific contemporary ideology.

Back in the day, don't know if it's still the case, the Christian Science Monitor was used as the go-to example of an unbiased news source. Using that point of reference, it's easy to tell the difference between a "Christian Science Monitor" LLM and a Jacobin/Breitbart/Slate LLM. And I know which I'd prefer


Stupid is stupid, creating black nazi soldiers it is stupid, it might be a consequences of trying to fix some bad bias in the model but you can't claim it not to be stupid. Same with refusing to accept children stories because they are violent , if a child can handle that there are evil characters that do evil things then also a an extremist conservative/racist/woke/libertarian/MAGA should be able to handle it. Of couse you can say it is aa bug, they try to make happy both extreme and you get this stupidity , but this AI guys need to grab the money so they need to suck the d of both extremes.

Or we claim now that classical children stories are bad for society and we need to only allow the modern american Disney stories where everything is solved with songs and the power of friendship.


You seem to be fixated on something completely different than the question at hand.


Can you explain?

My point is that

1 they train AI on internet data 2 they then try to fix illegal stuff, OK 3 but then they try to put political bias from both extremes and make the tools less productive since now a story with monkeys is racist and a story with violence is to violent and soem nude art is too vulgar.

The AI companies could decide to have the balls to only censor illegal shit, and if their model is racist or vulgar then cleanup their data and not do the lazy thing of adding some lazy stupid filter or system prompt to make happy the extremists.


It may have been asked in earnest.


Something being asked in earnest does not mean it should be evaluated in earnest.


Why not? Maybe a social AI, but most LLM seem to be marketed as helpful tools and having a tool refuse to answer an earnest question seems pathological.


Should a tool attempt to answer any incoherent question? The purpose of these things is to be thought assistants, yeah? What would a philosophy professor do if posed with an idiotic thought experiment? Respond like an automaton that gives no pushback?


> What would a philosophy professor do if posed with an idiotic thought experiment?

That's the bread and butter of philosophy! I'd absolutely expect an analysis.

I love asking stupid philosophy questions. "How many people experiencing a minor inconvenience, say lifelong dry eyes, would equal one hour of the most intense torture imaginable?" I'm not the only one!

https://www.lesswrong.com/posts/3wYTFWY3LKQCnAptN/torture-vs...


> That's the bread and butter of philosophy! I'd absolutely expect an analysis.

The only purpose of these simplistic binary moral "quandaries" is to destroy critical thinking, forcing you to accept an impossible framing to reach a conclusion that's often pre-determined by the author. Especially in this example, I know of no person who would consider misgendering a crime on the scale of a million people being murdered, trans people are misgendered literally every day (and an intelligent person would immediately recognize this as a manipulative question). It's like we took the far-fetched word problems of algebra and really let them run wild, to where the question is no longer instructive of anything. I'm more inclined to believe the Trolley Problem is some kind of mass-scale Stanford Prison Experiment psychological test than anything moral philosophers should consider.

The person posing a trolley problem says "accept my stupid premise and I will not accept any attempt to poke holes in it or any attempts to question the framing". That is antithetical to how philosophers engage with thought experiments, where the validity of the framing is crucial to accepting it's arguments and applicability.

> I love asking stupid philosophy questions. "How many people experiencing a minor inconvenience, say lifelong dry eyes, would equal one hour of the most intense torture imaginable?" I'm not the only one!

> https://www.lesswrong.com/posts/3wYTFWY3LKQCnAptN/torture-vs...

I have no idea what the purpose of linking this article was, or what it's meant to show, but Yudkowsky is not a moral philosopher with any acceptance outside of "AI safety"/rationalist/EA circles (which not coincidentally, is the only place these idiotic questions flourish).


holy shit its adg, hope you're doing well brother

- ann




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: