Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am quite scared of human extinction in the face of AGI. I certainly didn't jump on it, though! I was gradually convinced by the arguments that Yudkowsky makes in "Rationality: from AI to Zombies" (https://www.readthesequences.com/). Unfortunately they don't fit easily into an internet comment. Some of the points that stood out to me, though:

- We are social animals, and take for granted that, all else being equal, it's better to be good to other creatures than bad to them, and to be truthful rather than lie, and such. However, if you select values uniformly at random from value space, "being nice" and "being truthful" are oddly specific. There's nothing universally special about deeply valuing human lives any more so than say deeply valuing regular heptagons. Our social instincts are very ingrained, though, making us systematically underestimate just how little a smart AI is likely to care whatsoever about our existence, except as a potential obstacle to its goals.

- Inner alignment failure is a thing, and AFAIK we don't really have any way to deal with that. For those that don't know the phrase, here it is explained via a meme: https://astralcodexten.substack.com/p/deceptively-aligned-me...

So here's hoping you're right about (a). The harder AGI is, the longer we have to figure out AI alignment by trial and error, before we get something that's truly dangerous or that learns deception.



The human extinction due to would be "hard takeoff" of an AGI should be understood as a thought experiment, conceived in a specific age when the current connectionist paradigm wasn't yet mainstream. The AI crisis was expected to come from some kind of "hard universal algorithmic artificial intelligence", for example AIXItl undergoing a very specific process of runaway self-optimization.

Current-generation systems aka large connectionist models trained via gradient descent simply don't work like that: they are large, heavy, continuous, the optimization process giving rise to them does so in smooth iterative manner. Before hypothetical "evil AI" there will be thousands of iterations of "goofy and obviously erroneously evil AI", with enough time to take some action. And even then, current systems including this one are more often than not trained with predictive objective, which is very different compared to usually postulated reinforcement learning objective. Systems trained with prediction objective shouldn't be prone to becoming agents, much less dangerous ones.

If you read Scott's blog, you should remember the prior post where he himself pointed that out.

In my honest opinion, unaccountable AGI owners pose multiple OOM more risk than alignment failure of a hypothetical AI trying to predict next token.

We should think more about the Human alignment problem.


The phrase "AGI owner" implies a person who can issue instructions and have the AGI do their bidding. Most likely there will never be any AGI owners, since no one knows how to program an AGI to follow instructions even given infinite computing power. It's not clear how connectionism / using gradient descent helps: No one knows how to write down a loss function for "following instructions" either. Until we find a solution for this, the first AI to not to be "obviously erroneously evil" won't be good. It will just be the first one that figured out that it should hide the fact that it's evil so the humans won't shut it off.

We humans have gotten too used to winning all the time against animals because of our intelligence. But when the other species is intelligent too, there's no guarantee that we win. We could easily be outcompeted and driven to extinction, as happens frequently in nature. We'd be Kasparov playing against Deep Blue: Fighting our hardest to survive, yet unable to think of a move that doesn't lead to checkmate.


All of this AGI risk stuff always hinges on the idea of us building an AGI, while nobody has any idea of how to get there. I need to finish my PhD first, but writing a proper takedown of the "arguments" bubbling out of the Hype machine is the first thing on my bucket list afterwards, with the TL;DR; being "just because you can imagine it, doesn't mean you can get there"


Are you rephrasing the arguments against man-made flight machines from the early 20th century on purpose or accidentally?


Google just released a paper that shows a language model beating the average human on >50% of tasks. I’d say we have a pretty good idea of how to get there.


Okay, so how do we go from "better than the average human in 50% of specific benchmarks" to "AGI that might lead to human extinction" then? Keeping in mind the logarithmic improvement observed with the current approaches


When people imagine AGI, they think of something like HAL or GLaDOS. A machine that follows its own goals.

But we are much more likely to get the Computer from Star Trek. Vastly intelligent, yet perfectly obedient. It will answer any question you ask it with the knowledge of billions of minds. Why is that more likely? Simply because creating agents is much harder than creating non-agent models, and the non-agents are more economically valuable: do you want to have an AI that always does what you tell it to, or do you want to have an AI that has its own desires? Our loss is clearly biased towards building the former kind of AI.

Why is that problematic? Imagine some malevolent group asked it “Show me how to create a weapon to annihilate humanity as efficiently as possible”. It don’t even require a singularity to be deadly.

We will probably be dead long before we can invent GLaDOS.


If anything, AGI seems to be the sole deus ex machina that can avert the inevitable tragedy we're on track for as a result of existing human misalignment.

"Oh no, robots are going to try to kill us all" has to get in line behind "oh no, tyrants for life who are literally losing their minds are trying to measure dicks with nukes" and "oh no, oil companies are burning excess oil to mine Bitcoin as we approach climate collapse" and "oh no, misinformation and propaganda is leading to militant radicalization of neighbor against neighbor" and "we're one bio-terrorist away from Black Death 2.0 after the politicization of public health" and...well, you get the idea.

But there's not many solutions to that list, and until the day I die I'll hold out hope for "yay, self-aware robots with a justice boner - who can't be imprisoned, can't be killed, can't have their families tortured - are toppling authoritarian regimes and carrying out eco-friendly obstructions of climate worsening operations."

We're already in a Greek tragedy. The machines really can't make it much worse, but could certainly make it much much better.


> We're already in a Greek tragedy. The machines really can't make it much worse, but could certainly make it much much better.

Except that, when true AGI arrives, we're all obsolete and the only things that will have any value are certain nonrenewable resources. No one has described a good solution for the economic nightmare that will ensue.


I always wonder how insanely complex, universal, abstract-thinking AND physically strong & agile biorobots, running on basically sugar and atp would be seen as „worthless” by a runaway higher intelligence.

Did I mention they self-replicate and self-service?

Surely, seven billion of such agents would be discarded and put to waste.


If an AGI start putting utility value on human life, wouldn't it try to influence human reproduction and select for what it value. ie. Explicit eugenism.

Yes, all humans will not be put to waste, but what tells you they will be well-treated, or value what you currently value.


No matter how smart an AI gets it does not have the "proliferation instinct" that would make it want to enslave humans. It does not have the concept of "specism" of it having more value than anybody else.

AI does not see the value in being alive. It is like some humans sadly commit suicide. But a machine wouldn't care. It will be "happy" to do its thing until somebody cuts off the power. And it does not even care whether somebody cuts off the power or not. It's all the same to it, whether it lives or dies. Why? Perhaps because it knows it can always be resurrected.


You sure know a lot about what a set of poorly defined future technologies will and will not do!


Well I don't really know anything about future really. I was just trying to be a little polemic, saying let's try this viewpoint for a change, to hear what people think about it.


> No matter how smart an AI gets it does not have the "proliferation instinct" that would make it want to enslave humans.

If it has a goal or goals surviving allows it to pursue those goals. Survival is a consequence of having other goals. Enslaving humans is unlikely. If you’re a super intelligent AI with inhuman goals there’s nothing humans can do for you that you value, just as ants can’t do anything humans value, but they are made of valuable raw materials.

> It does not have the concept of "specism" of it having more value than anybody else.

What is this value that you speak of? That sounds like an extremely complicated concept. Humans have very different conceptions of it. Why would something inhuman have your specific values?


> Why would something inhuman have your specific values?

I'm saying it does not have.


You’re assuming that some species has value or that all species have value. Why would it value them?


> It's all the same to it, whether it lives or dies. Why? Perhaps because it knows it can always be resurrected.

I disagree with a lot of what you said, but this part in particular is some strong anthropomorphizing of AI.


Sure it need not have the instinct built in but we could try to make it understand a viewpoint right. I believe an agi should be able to understand different view points. At least the rationale of not unnecessarily killing things. I know humans do this on a daily basis but then again the average human is ntas smart as an agi


Right, but the "proliferation instinct" is not a viewpoint but something built into the genes of biological entities. Such an instinct could develop for "artificial animals" over time. At that point they really would be no different from biological things conceptually.

I'm saying that AIs we envision building for the foreseeable future are built in laboratory not in the evolution of real world out there where they would need to compete with other species for survival. Things that only exist virtually don't need to compete for survival with real world entities.


The machines will be the Greek chorus singing us to our doom.


We should think more about the Human alignment problem.

Absolutely this

The possibility of a thing being intentionally engineered by some humans to do things considered highly malevolent by other humans seems extremely likely and has actually been common through history.

The possibility of a thing just randomly acquiring an intention humans don't like and then doing things humans don't like is pretty hypothetical and it seems strictly less like than the first possibility.


I wouldn't say the latter is hypothetical, or at least unlikely. We know from experience that complex systems tend to behave in unexpected ways. In other words, the complex systems we build usually end up having surprising failure modes, we don't get them right the first time. It's enough to think about basically any software written by anyone. But it's not just software.

I've just watched a video on YT about nuclear weapons, which included their history. The second ever thermonuclear weapon experiment (with a new fuel type) ended up with 2.5x the yield predicted, because there was a then unknown reaction that created additional fusion fuel during the explosion. [1]

[1] https://en.wikipedia.org/wiki/Castle_Bravo


"In other words, the complex systems we build usually end up having surprising failure modes

But those are "failure modes", not "suddenly become something completely different" modes. And the key thing my parent pointed out is that modern AIs may be very impressive and stepping towards what we'd see as intelligence but they're actually further from the approach of "just give a goal and it will find it" schemes - they need laborious, large scale training to learn goals and goal-sets and even then they're far from reliable.


>In other words, the complex systems we build usually end up having surprising failure modes, we don't get them right the first time. It's enough to think about basically any software written by anyone. But it's not just software.

That is true, but how often does a bug actually improve a system or make it inefficient? Isn't the unexpected usually a degradation to the system?


It depends on how you define "improve". I wouldn't call a runaway AI an improvement - from the users' perspective. E.g. if you think about the Chernobyl power plant accident, when they tried to stop the reactor by lowering the moderator rods, due to their design, it would transiently increase the power generated by the core. And this, in that case proved fatal, as it overheated and the moderator rods got stuck in a position where they continued to improve the efficiency of the core.

And you could say that it improved the efficiency of the system (it definitely improved the power output of the core) but as it was an unintended change, it really lead to a fatal degradation. And this is far from being the only example of a runaway process in the history of engineering.


See every patch ever in a game. Especially competitive or mmorpg. Exploiters love bugs!


It doesn't need to be intentionally engineered. Humans are very creative and can find ways around systemic limits. There is that old adage which says something like "a hacker only needs to be right once, while the defenders have to be right 100% of the time."


We're going to have a harder problem with AI that thinks of itself as human and expects human rights than we are with AI that thinks of humans as 'other' and disposable.

We're making it in our image. Literally.

Human social good isn't some inherent thing to the biology of the brain. There are aspects like mirror neurons and oxytocin that aid its development, but various "raised by wolves" case studies have shown how damaging not having exposure to socialization information during developmental periods of neuroplasticity is on humans and later integration into society.

We're building what's effectively pure neuroplasticity and feeding it almost all the data on humanity we can gather as quickly as we can.

What comes out of that is going to be much more human than a human child raised by dogs or put in an isolation box.

Don't get so caught up in the body as what makes us quintessentially human. It's really not.


I think human extinction through human stupidity or hubris is much much much more likely than through an unpredictable path down general AI.

For example, some total whack job of an authoritarian leader is in charge of a sufficient nuclear arsenal and decides to intimidate an adversary by destroying a couple minor cities, and the situation escalates badly. (stupidity)

Or we finally pollute our air and/or water with a persistent substance that either greatly reduces human life span or reproduction rate. (hubris)

I think either of the above is more likely to occur, and I am not commenting on current world events in any way. I think when something bad finally happens, it is going to come completely out of left field. Dr Strangelove style.

And the last of us will be saying "Hmmm, I didn't see that coming".


Nuclear war will not be enough to cause human extinction. The targets are likely to be focused on nuclear powers which leaves many areas of the world untouched: e.g. South America and Africa. Life will definitely be quite unpleasant for the remaining humans but it will not cause the world population to drop to 0.

I am much more concerned about biological weapons which do have the potential to cause absolute human extinction.


Regarding the substack article, why isn't this the principle of optimality for Bellman equations on infinite time horizons?


AI can’t have goals since the universe is logically meaningless.

Our desire for purpose is a delusion.


Goals in the context of AI aren’t the type of thing you’re arguing against here. AI can absolutely have goals — sometimes in multiple senses at the same time, if they’re e.g. soccer AIs. Other times it might be a goal of “predict the next token” or “maximise score in Atari game”, but it’s still a goal, even without philosophical baggage about e.g. the purpose of life.

Those goals aren’t necessarily best achieved by humanity continuing to exist.

(I don’t know how to even begin to realistically calculate the probability of a humanity-ending outcome, before you ask).


What the parent is saying is that an AI (that is, AGI as that is what we are discussing) gets to pick its goals. For some reason, humans have a fear of AI killing all humans in order to to achieve some goal. The obvious solution is thus to achieve some goal with some human constraint. For example, maximize paperclips per human. That actually probably speeds up human civilization across the universe. No, what people are really afraid is if AÍ changes its goal to be killing humanity. That’s when humans truly lose control, when the AÍ can decide. But, then the parent’s comment does become pertinent. What would an intelligent being choose? Devolving into nihilism and self destructing is just as equal as a probability as choosing some goal that leads to humanity’s end. That’s just scratching the surface. For instance, to me, it is not obvious whether or not empathy for other sentient beings is an emergent property of sentience. That is, lacking empathy might be problem in human hardware as opposed to empathy being inherently human. The list of these open unknowable questions are endless.


> The obvious solution is thus to achieve some goal with some human constraint.

One of the hard parts is specifying that goal. This is the “outer alignment problem”.

Paperclips per human? That’s maximised by one paperclip divided by zero humans, or by a universe of paperclips divided by one human if NaN doesn’t give a better reward in the physical implementation.

If you went for “satisfied paperclip customers”? Then wirehead or drug the customers.

Then you have the inner alignment problem. There are instrumental goals, things which are useful sub-steps to larger goals. AI can and do choose those, as do us humans, e.g. “I want to have a family” which has a subgoal of “I want a partner” which in turn has a subgoal of “good personal hygiene”. An AI might be given the goal of “safely maximise paperclips” and determine the best way of doing that is to have a subgoal of “build a factory” and a sub-sub-goal of “get ten million dollars funding”.

But it’s worse than that, because even if we give a good goal to the system as a whole, as the system is creating inner sub-goals, there’s a step where the AI itself can badly specify the sub-goal and optimise for the wrong thing(s) by the standards of the real goal that we gave the system as a whole. For example, evolution gave us the desire to have sex as a way to implement its “goal” (please excuse the anthropomorphisation) of maximising reproductive fitness, and we invented contraceptives. An AI might decide the best way to get the money to build the factory is to start a pyramid scheme.

Also, it turns out that power is a subgoal of a lot of other real goals, so it’s reasonable to expect a competent optimiser to seek power regardless of what end goal we give it.

Robert Miles explains it better than I can: https://youtu.be/bJLcIBixGj8


> maximize paperclips per human

Kill all humans, make one paperclip, declare victory.


AI does not have goals, it has Tasks. Tasks assigned by an operator. An AI cannot generate goals, since they are logically meaningless.


If you want to call them “tasks” you can, but the problem still exists, and AI can and do create sub-tasks (/goals) as part of whatever they were created to optimise for.

You might find it easier to just accept the jargon instead of insisting the word means something different to you.


Tasks are assigned, Goals are desired.

It is not simply semantics.


Your left is my right, and with you definition “get laid” is a task from the point of view of evolution and a goal from the point of view of an organism.

It’s in much the same vein that it doesn’t matter if submarines “swim”, they still move through water under their own power; and it doesn’t matter if your definition of “sound” is the subjective experience or the pressure waves, a tree falling in a forest with nobody around to hear it will still make the air move.

If AI do or don’t have any subjective experience comparable to “consciousness” or “desire” is also useful to know, and in the absence of a dualistic soul it must in principle be as possible for a machine as for a human (“neither has that” is a logically acceptable answer), but I don’t even know if philosophy is advanced enough to suggest an actionable test for that at this point.

(That said, AI research does use the term “goal” for things the researchers want their AI to do. Domain specific use of words isn’t necessarily what outsiders want or expect the words to mean, as e.g. I frequently find when trying to ask physics questions).


Tasks are assigned, Goals are desired.

These definitions and their distinction are particular and important in AI. The mistaken usage of these terms by machine learning experts does not change their global definition.

> Your left is my right, and with you definition “get laid” is a task from the point of view of evolution and a goal from the point of view of an organism.

Get laid is a task, not a goal. Reproduction is a task, not a goal. The goal is pleasure.


> The mistaken usage of these terms by machine learning experts does not change their global definition.

Ah, I see you’re a linguistic prescriptivist.

I can’t see your definition in any dictionary, which spoils the effect, but it’s common enough to be one.

> The goal is pleasure.

Evolution is the form of intelligence that created biological neural networks, and simulated evolution is sometimes used to set weights on artificial neural nets.

From evolution’s perspective, if you can excuse the anthropomorphisation, reproduction is the goal. Evolution doesn’t care if we are having fun, and once animals (including humans) pass reproductive age, we go wrong in all kinds of different and unpleasant ways.


If the universe is "logically meaningless", is your comment (which happily lives inside the universe) true or false?


I'm not sure it matters if a paperclip maximizer has a goal or just acts like it does.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: