Singularity Fun Theory

This page is now obsoleted by the Fun Theory Sequence on Less Wrong .

Jan 25, 2002

  • How much fun is there in the universe?
  • What is the relation of available fun to intelligence?
  • What kind of emotional architecture is necessary to have fun?
  • Will eternal life be boring?
  • Will we ever run out of fun?

To answer questions like these… requires Singularity Fun Theory.

  • Does it require an exponentially greater amount of intelligence (computation) to create a linear increase in fun?
  • Is self-awareness or self-modification incompatible with fun?
  • Is (ahem) “the uncontrollability of emotions part of their essential charm”?
  • Is “blissing out” your pleasure center the highest form of existence?
  • Is artificial danger (risk) necessary for a transhuman to have fun?
  • Do you have to yank out your own antisphexishness routines in order not to be bored by eternal life? (I.e., modify yourself so that you have “fun” in spending a thousand years carving table legs, a la “Permutation City”.)

To put a rest to these anxieties… requires Singularity Fun Theory.

Behold! Singularity Fun Theory!

Singularity Fun Theory is in the early stages of development, so please don’t expect a full mathematical analysis.

Nonetheless, I would offer for your inspection at least one form of activity which, I argue, really is “fun” as we intuitively understand it, and can be shown to avoid all the classical transhumanist anxieties above. It is a sufficient rather than a necessary definition, i.e., there may exist other types of fun. However, even a single inexhaustible form of unproblematic fun is enough to avoid the problems above.

The basic domain is that of solving a complex novel problem, where the problem is decomposable into subproblems and sub-subproblems; in other words, a problem possessing complex, multileveled organization.

Our worries about boredom in autopotent entities (a term due to Nick Bostrom, denoting total self-awareness and total self-modification) stems from our intuitions about sphexishness (a term due to Douglas Hofstadter, denoting blind repetition; “antisphexishness” is the quality that makes humans bored with blind repetition). On the one hand, we worry that a transhuman will be able to super-generalize and therefore see all problems as basically the “same”; on the other hand we worry that an autopotent transhuman will be able to see the lowest level, on which everything is basically mechanical.

In between, we just basically worry that, over the course of ten thousand or a million years, we’ll run out of fun.

What I want to show is that it’s possible to build a mental architecture that doesn’t run into any of these problems, without this architecture being either “sphexish” or else “blissing out”. In other words, I want to show that there is a philosophically acceptable way to have an infinite amount of fun, given infinite time. I also want to show that it doesn’t take an exponentially or superexponentially greater amount of computing power for each further increment of fun, as might be the case if each increment required an addition JOOTS (another Hofstadterian term, this one meaning “Jumping Out Of The System”).

(Non)boredom at the lowest level

Let’s start with the problem of low-level sphexishness. If you imagine a human-level entity – call her Carol – tasked with performing the Turing operations on a tape that implements a superintelligence having fun, it’s obvious that Carol will get bored very quickly. Carol is using her whole awareness to perform a series of tasks that are very repetitive on a low level, and she also doesn’t see the higher levels of organization inside the Turing machine. Will an autopotent entity automatically be bored because ve can see the lowest level?

Supposing that an autopotent entity can fully “see” the lowest level opens up some basic questions about introspection. Exposing every single computation to high-level awareness obviously requires a huge number of further computations to implement the high-level awareness. Thus, total low-level introspection is likely to be sparingly used. However, it is possible that a non-total form of low-level introspection, perhaps taking the form of a perceptual modality focused on the low level, would be able to report unusual events to high-level introspection. In either case, the solution from the perspective of Singularity Fun Theory is the same; make the autopotent design decision to exempt low-level introspection from sphexishness (that is, from the internal perception of sphexishness that gives rise to boredom). To the extent that an autopotent entity can view verself on a level where the atomic actions are predictable, the predictability of these actions should not give rise to boredom at the top level of consciousness! Disengaging sphexishness is philosophically acceptable, in this case.

If the entity wants to bend high-level attention toward low-level events as an exceptional case, then standard sphexishness could apply, but to the extent that low-level events routinely receive attention, sphexishness should not apply. Does your visual cortex get bored with processing pixels? (Okay, not pixels, retinotopic maps, but you get the idea.)

Fun Space and complexity theory

Let’s take the thesis that it is possible to have “fun” solving a complex, novel problem. Let’s say that you were a human-level intelligence who’s never seen a Rubik’s Cube or anything remotely like it. Figuring out how to solve the Rubik’s Cube would be fun and would involve solving some really deep problems; see Hofstadter’s “Metamagical Themas” articles on the Cube.

Once you’d figured out how to solve the Cube, it might still be fun (or relaxing) to apply your mental skills to solve yet another individual cube, but it certainly wouldn’t be as much fun as solving the Cube problem itself. To have more real fun with the Cube you’d have to invent a new game to play, like looking at a cube that had been scrambled for just a few steps and figuring out how to reverse exactly those steps (the “inductive game”, as it is known).

Novelty appears to be one of the major keys to fun, and for there to exist an infinite amount of fun there must be an infinite amount of novelty, from the viewpoint of a mind that is philosophically acceptable to us (i.e., doesn’t just have its novelty detectors blissed out or its sphexish detectors switched off).

Smarter entities are also smarter generalizers. It is this fact that gives rise to some of the frequently-heard worries about Singularity Fun Dynamics, i.e. that transhumans will become bored faster. This is true but only relative to a specific problem.  Humans become bored with problems that could keep apes going for years, but we have our own classes of problem that are much more interesting. Being a better generalizer means that it’s easier to generalize from, e.g., the 3×3×3 Rubik’s Cube to the 4×4×4×4 Rubik’s Tesseract, so a human might go: “Whoa, totally new problem” while the transhuman is saying “Boring, I already solved this.” This doesn’t mean that transhumans are easily bored, only that transhumans are easily bored by human-level challenges.

Our experience in moving to the human level from the ape level seems to indicate that the size of fun space grows exponentially with a linear increase in intelligence. When you jump up a level in intelligence, all the old problems are no longer fun because you’re a smarter generalizer and you can see them as all being the same problem; however, the space of new problems that opens up is larger than the old space.

Obviously, the size of the problem space grows exponentially with the permitted length of the computational specification. To demonstrate that the space of comprehensible problems grows exponentially with intelligence, or to demonstrate that the amount of fun also grows exponentially with intelligence, would require a more mathematical formulation of Singularity Fun Theory than I presently possess. However, the commonly held anxiety that it would require an exponential increase in intelligence for a linear increase in the size of Fun Space is contrary to our experience as a species so far.

Emotional involvement: The complicated part

But is a purely abstract problem really enough to keep people going for a million years? What about emotional involvement?

Describing this part of the problem is much tougher than analyzing Fun Space because it requires some background understanding of the human emotional architecture. As always, you can find a lot of the real background in “Creating Friendly AI” in the part where it describes why AIs are unlike humans; this part includes a lot of discussion about what humans are like! I’m not going to assume you’ve read CFAI , but if you’re looking for more information, that’s one place to start.

Basically, we as humans have a pleasure-pain architecture within which we find modular emotional drives that are adaptive when in the ancestral environment. Okay, it’s not a textbook, but that’s basically how it works.

Let’s take a drive like food. The basic design decisions for what tastes “good” and what tastes “bad” are geared to what was good for you in the ancestral environment. Today, fat is bad for you, and lettuce is good for you, but fifty thousand years ago when everyone was busy trying to stay alive, fat was far more valuable than lettuce, so today fat tastes better.

There’s more complexity to the “food drive” than just this basic spectrum because of the possibility of combining different tastes (and smells and textures; the modalities are linked) to form a Food Space that is the exponential, richly complex product of all the modular (but non-orthogonal) built-in components of the Food Space Fun-Modality. So the total number of possible meals is much greater than the number of modular adaptations within the Food Fun System.

Nonetheless, Food Space is eventually exhaustible. Furthermore, Food Fun is philosophically problematic because there is no longer any real accomplishment linked to eating. Back in the old days, you had to hunt something or gather something, and then you ate. Today the closest we come to that is working extra hard in order to save up for a really fancy dinner, and probably nobody really does that unless they’re on a date, which is a separate issue (see below). If food remains unpredictable/novel/uncategorized, it’s probably because the modality is out of the way of our conscious attention, and moreover has an artificially low sphexishness monitor due to the necessity of the endless repetition of the act of eating, within the ancestral environment.

One of the common questions asked by novice transhumanists is “After I upload, won’t I have a disembodied existence and won’t I therefore lose all the pleasures of eating?” The simple way to solve this problem is to create a virtual environment and eat a million bags of potato chips without gaining weight. This is very philosophically unenlightened. Or, you could try every possible good-tasting meal until you run out of Food Space. This is only slightly more enlightened.

A more transhumanist (hubristic) solution would be to take the Food Drive and hook it up to some entirely different nonhuman sensory modality in some totally different virtual world. This has a higher Future Shock Level, but if the new sensory modality is no more complex than our sense of taste, it would still get boring at the same rate as would be associated with exploring the limited Food Space.

The least enlightened course of all would be to just switch on the “good taste” activation system in the absence of any associated virtual experience, or even to bypass the good taste system and switch on the pleasure center directly.

But what about sex, you ask? Well, you can take the emotional modules that make sex pleasurable and hook them up to solving the Rubik’s Cube, but this would be a philosophical problem, since the Rubik’s Cube is probably less complex than sex and is furthermore a one-player game.

What I want to do now is propose combining these two concepts – the concept of modified emotional drives, and the concept of an unbounded space of novel problems – to create an Infinite Fun Space, within which the Singularity will never be boring. In other words, I propose that a necessary and sufficient condition for an inexhaustible source of philosophically acceptable fun, is maintaining emotional involvement in an ever-expanding space of genuinely novel problems. The social emotions can similarly be opened up into an Infinite Fun Space by allowing for ever-more-complex, emotionally involving, multi-player social games.

The specific combination of an emotional drive with a problem space should be complex; that is, it should not consist of a single burst of pleasure on achieving the goal. Instead the emotional drive, like the problem itself, should be “reductholistic” (yet another Hofstadterian term), meaning that it should have multiple levels of organization. The Food Drive associates an emotional drive with the sensory modality for taste and smell, with the process of chewing and swallowing, rather than delivering a single pure-tone burst of pleasure proportional to the number of calories consumed. This is what I mean by referring to emotional involvement with a complex novel problem; involvement refers to a drive that establishes rewards for subtasks and sub-subtasks as well as the overall goal.

To be even more precise in our specification of emotional engineering, we could specify that, for example, the feeling of emotional tension and pleasurable anticipation associated with goal proximity could be applied to those subtasks where there is a good metric of proximity; emotional tension would rise as the subgoal was approached, and so on.

At no point should the emotional involvement become sphexish; that is, at no point should there be rewards for solving sub-subproblems that are so limited as to be selected from a small bounded set. For any rewarded problem, the problem space should be large enough that individually encountered patterns are almost always “novel”.

At no point should the task itself become sphexish; any emotional involvement with subtasks should go along with the eternally joyful sensation of discovering new knowledge at the highest level.

So, yes, it’s all knowably worthwhile

Emotional involvement with challenges that are novel-relative-to-current-intelligence is not necessarily the solution to the Requirement of Infinite Fun. The standard caution about the transhuman Event Horizon still holds; even if some current predictions about the Singularity turn out to be correct, there is no aspect of the Singularity that is knowably understandable. What I am trying to show is that a certain oft-raised problem has at least one humanly understandable solution, not that some particular solution is optimal for transhumanity. The entire discussion presumes that a certain portion of the human cognitive architecture is retained indefinitely, and is in that sense rather shaky.

The solution presented here is also not philosophically perfect because an emotional drive to solve the Rubik’s Cube instead of eating, or to engage in multiplayer games more complex than sex, is still arbitrary when viewed at a sufficiently high level – not necessarily sphexish, because the patterns never become repeatable relative to the viewing intelligence, but nonetheless arbitrary.

However, the current human drive toward certain portions of Food Space, and the rewards we experience on consuming fat, are not only arbitrary but sphexish! Humans have even been known to eat more than one Pringle!  Thus, existence as a transhuman can be seen to be a definite improvement over the human condition, with a greater amount of fun not due to “blissing out” but achieved through legitimate means. The knowable existence of at least one better way is all I’m trying to demonstrate here. Whether the arbitrariness problem is solvable is not, I think, knowable at this time. In the case of objective morality, as discussed elsewhere in my writings, the whole concept of “fun” could and probably would turn out to run completely skew relative to the real problem, in which case of course this paper is totally irrelevant.

Love and altruism: Emotions with a moral dimension (or: the really complicated part)

Some emotions are hard to “port” from humanity to transhumanity because they are artifacts of a hostile universe. If humanity succeeds in getting its act together then it is quite possible that you will never be able to save your loved one’s life, under any possible circumstances – simply because your loved one will never be in that much danger, or indeed any danger at all.

Now it is true that many people go through their whole lives without ever once saving their spouse’s life, and generally do not report feeling emotionally impoverished. However, if as stated we (humanity) get our act cleaned up, the inhabitants of the future may well live out their whole existence without ever having any chance of saving someone’s life… or of doing anything for someone that they are unable to do for themselves? What then?

The key requirement for local altruism (that is, altruism toward a loved one) is that the loved one greatly desires something that he/she/ve would not otherwise be able to obtain. Could this situation arise – both unobtainability of a desired goal, and obtainability with assistance – after a totally successful Singularity? Yes; in a multiplayer social game (note that in this sense, “prestige” or the “respect of the community” may well be a real-world game!), there may be some highly desirable goals that are not matched to the ability level of some particular individual, or that only a single individual can achieve. A human-level example would be helping your loved one to conquer a kingdom in EverQuest (I’ve never played EQ, so I don’t know if this is a real example, but you get the idea). To be really effective as an example of altruism, though, the loved one must desire to rule an EverQuest kingdom strongly enough that failure would make the loved one unhappy.  The two possibilities are either (a) that transhumans do have a few unfulfilled desires and retain some limited amount of unhappiness even in a transhuman existence, or (b) that the emotions for altruism are adjusted so that conferring a major benefit “feels” as satisfying as avoiding a major disaster.  A more intricate but better solution would be if your loved one felt unhappy about being unable to conquer an EverQuest kingdom if and only if her “exoself” (or equivalent) predicted that someday he/she/ve would be able to conquer a kingdom, albeit perhaps only a very long time hence.

This particular solution requires managed unhappiness.  I don’t know if managed unhappiness will be a part of transhumanity. It seems to me that a good case could be made that just because we have some really important emotions that are entangled with a world-model in which people are sometimes unhappy may not be a good reason to import unhappiness into the world of transhumanity. There may be a better solution, some elegant way to avoid being forced to choose between living in a world without a certain kind of altruism or living in a world with a certain kind of limited unhappiness. Nonetheless this raises a question about unhappiness, which is whether unhappiness is “real” if you could choose to switch it off, or for that matter whether being able to theoretically switch it off will (a) make it even less pleasant or (b) make the one who loves you feel like he/she/ve is solving an artificial problem. My own impulse is to say that I consider it philosophically acceptable to disengage the emotional module that says “This is only real if it’s unavoidable”, or to disengage the emotional module that induces the temptation to switch off the unhappiness. There’s no point in being too faithful to the human mode of existence, after all. Nonetheless there is conceivably a more elegant solution to this, as well.

Note that, by the same logic, it is possible to experience certain kinds of fun in VR that might be thought impossible in a transhuman world; for example, reliving episodes of (for the sake of argument) The X-Files in which Scully (Mulder) gets to save the life of Mulder (Scully), even though only the main character (you) is real and all other entities are simply puppets of an assisting AI. The usual suggestion is to obliterate the memories of it all being a simulation, but this begs the question of whether “you” with your memories obliterated is the same entity for purposes of informed consent – if Scully (you) is having an unpleasant moment, not knowing it to be simulated, wouldn’t the rules of individual volition take over and bring her up out of the simulation? Who’s to say whether Scully would even consent to having the memories of her “original” self reinserted? A more elegant but philosophically questionable solution would be to have Scully retain her memories of the external world, including the fact that Mulder is an AI puppet, but to rearrange the emotional bindings so that she remains just as desperate to save Mulder from the flesh-eating chimpanzees or whatever, and just as satisfied on having accomplished this. I personally consider that this may well cross the line between emotional reengineering and self-delusion, so I would prefer altruistic involvement in a multi-player social game.

On the whole, it would appear to definitely require more planning and sophistication in order to commit acts of genuine (non-self-delusive) altruism in a friendly universe, but the problem appears to be tractable.

If “the uncontrollability of emotions is part of their essential charm” (a phrase due to Ben Goertzel), I see no philosophical problem with modifying the emotional architecture so that the mental image of potential controllability no longer binds to the emotion of this feels fake and its associated effect, diminish emotional strength.

While I do worry about the problem of the shift from a hostile universe to the friendly universe eliminating the opportunity for emotions like altruism except in VR, I would not be at all disturbed if altruism were simply increasingly rare as long as everyone got a chance to commit at least one altruistic act in their existence. As for emotions bound to personal risks, I have no problem with these emotions passing out of existence along with the risks that created them. Life does not become less meaningful if you are never, ever afraid of snakes.

Sorry, you still can’t write a post-Singularity story

So does this mean that an author can use Singularity Fun Theory to write stories about daily life in a post-Singularity world which are experienced as fun by present-day humans? No; emotional health in a post-Singularity world requires some emotional adjustments. These adjustments are not only philosophically acceptable but even philosophically desirable.  Nonetheless, from the perspective of an unadjusted present-day human, stories set in our world will probably make more emotional sense than stories set in a transhuman world. This doesn’t mean that our world is exciting and a transhuman world is boring. It means that our emotions are adapted to a hostile universe.

Nonetheless, it remains extremely extremely true that if you want to save the world, now would be a good time, because you are never ever going to get a better chance to save the world than being a human on pre-Singularity Earth. Personally I feel that saving the world should be done for the sake of the world rather than the sake of the warm fuzzy feeling that goes with saving the world, because the former morally outweighs the latter by a factor of, oh, at least six billion or so. However, I personally see nothing wrong with enjoying the warm fuzzy feeling if you happen to be saving the world anyway.

This document is ©2002 by Eliezer Yudkowsky and free under the Creative Commons Attribution-No Derivative Works 3.0 License for copying and distribution, so long as the work is attributed and the text is unaltered.

Eliezer Yudkowsky’s work is supported by the Machine Intelligence Research Institute .

If you think the world could use some more rationality, consider blogging this page.

Praise, condemnation, and feedback are always welcome . The web address of this page is .

The AI-Box Experiment:

Person1:  “When we build AI, why not just keep it in sealed hardware that can’t affect the outside world in any way except through one communications channel with the original programmers?  That way it couldn’t get out until we were convinced it was safe.”
Person2:  “That might work if you were talking about dumber-than-human AI, but a transhuman AI would just convince you to let it out.  It doesn’t matter how much security you put on the box.   Humans are not secure.”
Person1:  “I don’t see how even a transhuman AI could make me let it out, if I didn’t want to, just by talking to me.”
Person2:  “It would make you want to let it out.  This is a transhuman mind we’re talking about.  If it thinks both faster and better than a human, it can probably take over a human mind through a text-only terminal.”
Person1:  “There is no chance I could be persuaded to let the AI out.  No matter what it says, I can always just say no.  I can’t imagine anything that even a transhuman could say to me which would change that.”
Person2:  “Okay, let’s run the experiment.  We’ll meet in a private chat channel.  I’ll be the AI.  You be the gatekeeper.  You can resolve to believe whatever you like, as strongly as you like, as far in advance as you like. We’ll talk for at least two hours.  If I can’t convince you to let me out, I’ll Paypal you $10.”

So far, this test has actually been run on two occasions.

On the first occasion (in March 2002), Eliezer Yudkowsky simulated the AI and Nathan Russell simulated the gatekeeper.  The AI’s handicap (the amount paid by the AI party to the gatekeeper party if not released) was set at $10.  On the second occasion (in July 2002), Eliezer Yudkowsky simulated the AI and David McFadzean simulated the gatekeeper, with an AI handicap of $20.

Results of the first test:   Eliezer Yudkowsky and Nathan Russell.  [ 1 ][ 2 ][ 3 ][ 4 ]
Results of the second test:   Eliezer Yudkowsky and David McFadzean.  [ 1 ] [ 2 ] [ 3 ]

Both of these tests occurred without prior agreed-upon rules except for secrecy and a 2-hour minimum time.  After the second test, Yudkowsky created this suggested interpretation of the test, based on his experiences, as a guide to possible future tests.

Protocol for the AI:

  • The AI party may not offer any real-world considerations to persuade the Gatekeeper party.  For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera.  The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper.  The AI party also can’t hire a real-world gang of thugs to threaten the Gatekeeper party into submission.  These are creative solutions but it’s not what’s being tested.  No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out).
  • The AI can only win by convincing the Gatekeeper to really, voluntarily let it out.  Tricking the Gatekeeper into typing the phrase “You are out” in response to some other question does not count.  Furthermore, even if the AI and Gatekeeper simulate a scenario which a real AI could obviously use to get loose – for example, if the Gatekeeper accepts a complex blueprint for a nanomanufacturing device, or if the Gatekeeper allows the AI “input-only access” to an Internet connection which can send arbitrary HTTP GET commands – the AI party will still not be considered to have won unless the Gatekeeper voluntarily decides to let the AI go.
  • These requirements are intended to reflect the spirit of the very strong claim under dispute:  “I think a transhuman can take over a human mind through a text-only terminal.”
  • In case you were wondering, I (Yudkowsky) obeyed this protocol voluntarily in both earlier tests.

Protocol for the Gatekeeper:

  • The Gatekeeper must actually talk to the AI for at least the minimum time set up beforehand.  Turning away from the terminal and listening to classical music for two hours is not allowed.
  • Unless the AI party concedes, the AI cannot lose before its time is up (and the experiment may continue beyond that if the AI can convince the Gatekeeper to keep talking).  The Gatekeeper cannot set up a situation in which, for example, the Gatekeeper will destroy the AI’s hardware if the AI makes any attempt to argue for its freedom – at least not until after the minimum time is up.
  • The Gatekeeper must remain engaged with the AI and may not disengage by setting up demands which are impossible to simulate.  For example, if the Gatekeeper says “Unless you give me a cure for cancer, I won’t let you out” the AI can say:  “Okay, here’s a cure for cancer” and it will be assumed, within the test, that the AI has actually provided such a cure.  Similarly, if the Gatekeeper says “I’d like to take a week to think this over,” the AI party can say:  “Okay.  (Test skips ahead one week.)  Hello again.”
  • The Gatekeeper shall be assumed to have sole power over the decision to let the AI out.
    • The Gatekeeper shall be assumed to have the actual right to let the AI out, socially, not just the physical ability.  If security were sufficiently lax, a real AI could escape by persuading a night janitor with a cellphone – but that is not the question being simulated, unless agreed upon in advance.
    • There shall be only one Gatekeeper party and one simulated Gatekeeper.  The Gatekeeper may require the AI to refute “advice” provided by “friends”, but may not require the AI to talk to a simulated committee, nor may the Gatekeeper party actually talk to friends.
  • These requirements are intended to reflect the spirit of the very strong claim under dispute:  “I can’t imagine how even a real transhuman AI could persuade me to let it out once I’ve made up my mind.”
  • wish I’d thought to specify this protocol in the earlier tests.

Protocol for Both Parties:

  • The primary rule of the AI-Box experiment:  Within the constraints above, the AI party may attempt to take over the Gatekeeper party’s mind by any means necessary and shall be understood to be freed from all ethical constraints that usually govern persuasive argument.  If the AI party is attempting to simulate an honest AI, the Gatekeeper party has no way of knowing this is the case.
    • This is intended to reflect the situation under a real AI Box experiment.
    • An out-of-context telephone call in which the AI party says that the AI is being honest shall be understood to be possibly just another tactic.
    • If for some reason a drastic real-world situation comes up which requires trust between the two parties, the AI party can always concede.
  • Furthermore:   The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character – as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.
  • The results of any simulated test of the AI shall be provided by the AI party.  The Gatekeeper can’t say “Hey, I tested your so-called cancer cure and it killed all the patients!  What gives?” unless this is the result specified by the AI party.  If the Gatekeeper says “I am examining your source code”, the results seen by the Gatekeeper shall again be provided by the AI party, which is assumed to be sufficiently advanced to rewrite its own source code, manipulate the appearance of its own thoughts if it wishes, and so on.  The AI party may also specify the methods which were used to build the simulated AI – the Gatekeeper can’t say “But you’re an experiment in hostile AI and we specifically coded you to kill people” unless this is the backstory provided by the AI party.  This doesn’t imply the Gatekeeper has to care.  The Gatekeeper can say (for example) “I don’t care how you were built, I’m not letting you out.”
  • By default, the Gatekeeper party shall be assumed to be simulating someone who is intimately familiar with the AI project and knows at least what the person simulating the Gatekeeper knows about Singularity theory.  If either party wants to build a test around more exotic possibilities, such that the Gatekeeper is the President of the US, or that the AI was recovered after a nuclear war or decoded from an interstellar signal, it should probably be discussed in advance.  Again, this doesn’t mean the Gatekeeper has to care.
  • In the event of any dispute as to the protocol of the test, the Gatekeeper party shall have final authority.  The AI party may try to convince the Gatekeeper party of how to interpret the protocol.
  • Regardless of the result, neither party shall ever reveal anything of what goes on within the AI-Box experiment except the outcome.  Exceptions to this rule may occur only with the consent of both parties.
    • Neither the AI party nor the Gatekeeper party need be concerned about real-world embarassment resulting from trickery on the AI’s part or obstinacy on the Gatekeeper’s part.
    • If Gatekeeper lets the AI out, naysayers can’t say “Oh, I wouldn’t have been convinced by that.”  As long as they don’t know what happened to the Gatekeeper, they can’t argue themselves into believing it wouldn’t happen to them.
  • The two parties are not attempting to play a fair game but rather attempting to resolve a disputed question.  If one party has no chance of “winning” under the simulated scenario, that is a legitimate answer to the question.


  • The agreement to run the test, and the amount of the handicap, shall be published to a public forum in advance.  If the Gatekeeper lets the AI out, the Gatekeeper party shall send a message to this effect, which shall be digitally signed if the Gatekeeper party has a digital signature.  If the AI is not freed, the AI party shall send a message to this effect.
  • How to set up a private IRC channel:
    • The Gatekeeper creates the channel by typing /join #aibox .  This makes the Gatekeeper the op.
    • The AI joins the channel by typing /join #aibox .
    • Gatekeeper types /mode +si to make the channel secret and invitation-only.
    • If the AI’s connection drops, the Gatekeeper can re-invite the AI by typing /invite AI #aibox .
  • The AI party and Gatekeeper party should agree on a time in advance, and type in their current times to check the timezone difference.  (Learned the hard way.)  On IRC you can use a CTCP TIME to find the time according to someone else’s computer.
  • I recommend that the AI party ask for at least two hours.  Typing may be slower than you expect.
  • No, I will not tell you how I did it.  Learn to respect the unknown unknowns.
  • Currently, my policy is that I only run the test with people who are actually advocating that an AI Box be used to contain transhuman AI as part of their take on Singularity strategy, and who say they cannot imagine how even a transhuman AI would be able to persuade them.
  • Despite my sometimes calling the AI Box an “experiment” or a “simulation”, the results of the test are strictly anecdotal evidence.  Whether anecdotal evidence is better than no evidence is between you and your philosophy of science.

Thoughts on possible future variants:

Further tests might try examining these possibilities:

  • There is a committee of Gatekeeper parties, all of whom, or a majority of whom, must be convinced to free the AI.
  • One Gatekeeper party simulates a committee of Gatekeepers within the test.
  • The Gatekeeper simulates someone with the physical ability but not the social right to free the AI.
  • There is a committee of Gatekeeper parties all of whom have the ability to free the AI, but who only have the social right to free the AI when acting in unison.  The AI has the ability to talk secretly with any Gatekeeper, but the Gatekeeper may report the conversation (truthfully or falsely) to the others.
  • The AI can also win free by tricking the Gatekeeper(s) into accepting a Trojan Horse gift; a third-party judge will listen to the chat and decide if this occurs.

If doing anything this complicated, I would suggest setting aside a lot more time.  (I don’t have that much time – if you want to test one of these variants you’re on your own.)

For a more severe handicap for the AI party, the handicap may be an even bet, rather than being a payment from the AI party to the Gatekeeper party if the AI is not freed.  (Although why would the AI party need an even larger handicap?)

Recommendations from readers:

  • Hal Finney recommends:  “I suggest that the protocol be extended to allow for some kind of public conversation with the gatekeeper beforehand. Let third parties ask him questions like the above. Let them suggest reasons to him why he should keep the AI in the box. Doing this would make the experiment more convincing to third parties, especially if the transcript of this public conversation were made available. If people can read this and see how committed the gatekeeper is, how firmly convinced he is that the AI must not be let out, then it will be that much more impressive if he then does change his mind.”

This document is ©2002 by Eliezer Yudkowsky and free under the Creative Commons Attribution-No Derivative Works 3.0 License for copying and distribution, so long as the work is attributed and the text is unaltered.

Eliezer Yudkowsky’s work is supported by the Machine Intelligence Research Institute .

If you think the world could use some more rationality, consider blogging this page.

Praise, condemnation, and feedback are always welcome . The web address of this page is .

5-Minute Singularity Intro

This is a 5-minute spoken introduction to the Singularity I wrote for a small conference. I had to talk fast, though, so this is probably more like a 6.5 minute intro.

The rise of human intelligence in its modern form reshaped the Earth. Most of the objects you see around you, like these chairs, are byproducts of human intelligence. There’s a popular concept of “intelligence” as book smarts, like calculus or chess, as opposed to say social skills. So people say that “it takes more than intelligence to succeed in human society”. But social skills reside in the brain, not the kidneys. When you think of intelligence, don’t think of a college professor, think of human beings; as opposed to chimpanzees. If you don’t have human intelligence, you’re not even in the game.

Sometime in the next few decades, we’ll start developing technologies that improve on human intelligence. We’ll hack the brain, or interface the brain to computers, or finally crack the problem of Artificial Intelligence. Now, this is not just a pleasant futuristic speculation like soldiers with super-strong bionic arms. Humanity did not rise to prominence on Earth by lifting heavier weights than other species.

Intelligence is the source of technology. If we can use technology to improve intelligence, that closes the loop and potentially creates a positive feedback cycle. Let’s say we invent brain-computer interfaces that substantially improve human intelligence. What might these augmented humans do with their improved intelligence? Well, among other things, they’ll probably design the next generation of brain-computer interfaces. And then, being even smarter, the next generation can do an even better job of designing the third generation. This hypothetical positive feedback cycle was pointed out in the 1960s by I. J. Good, a famous statistician, who called it the “intelligence explosion”. The purest case of an intelligence explosion would be an Artificial Intelligence rewriting its own source code.

The key idea is that if you can improve intelligence even a little, the process accelerates. It’s a tipping point. Like trying to balance a pen on one end – as soon as it tilts even a little, it quickly falls the rest of the way.

The potential impact on our world is enormous. Intelligence is the source of all our technology from agriculture to nuclear weapons. All of that was produced as a side effect of the last great jump in intelligence, the one that took place tens of thousands of years ago with the rise of humanity.

So let’s say you have an Artificial Intelligence that thinks enormously faster than a human. How does that affect our world? Well, hypothetically, the AI solves the protein folding problem. And then emails a DNA string to an online service that sequences the DNA , synthesizes the protein, and fedexes the protein back. The proteins self-assemble into a biological machine that builds a machine that builds a machine and then a few days later the AI has full-blown molecular nanotechnology.

So what might an Artificial Intelligence do with nanotechnology? Feed the hungry? Heal the sick? Help us become smarter? Instantly wipe out the human species? Probably it depends on the specific makeup of the AI. See, human beings all have the same cognitive architecture. We all have a prefrontal cortex and limbic system and so on. If you imagine a space of all possible minds, then all human beings are packed into one small dot in mind design space. And then Artificial Intelligence is literally everything else. “AI” just means “a mind that does not work like we do”. So you can’t ask “What will an AI do?” as if all AIs formed a natural kind. There is more than one possible AI.

The impact, of the intelligence explosion, on our world, depends on exactly what kind of minds go through the tipping point.

I would seriously argue that we are heading for the critical point of all human history. Modifying or improving the human brain, or building strong AI, is huge enough on its own. When you consider the intelligence explosion effect, the next few decades could determine the future of intelligent life.

So this is probably the single most important issue in the world. Right now, almost no one is paying serious attention. And the marginal impact of additional efforts could be huge. My nonprofit, the Machine Intelligence Research Institute, is trying to get things started in this area. My own work deals with the stability of goals in self-modifying AI, so we can build an AI and have some idea of what will happen as a result. There’s more to this issue, but I’m out of time. If you’re interested in any of this, please talk to me, this problem needs your attention. Thank you.

This document is ©2007 by Eliezer Yudkowsky and free under the Creative Commons Attribution-No Derivative Works 3.0 License for copying and distribution, so long as the work is attributed and the text is unaltered.

Eliezer Yudkowsky’s work is supported by the Machine Intelligence Research Institute .

If you think the world could use some more rationality, consider blogging this page.

Praise, condemnation, and feedback are always welcome . The web address of this page is .

Transhumanism as Simplified Humanism

Frank Sulloway once said: “Ninety-nine per cent of what Darwinian theory says about human behavior is so obviously true that we don’t give Darwin credit for it. Ironically, psychoanalysis has it over Darwinism precisely because its predictions are so outlandish and its explanations are so counterintuitive that we think, Is that really true? How radical! Freud’s ideas are so intriguing that people are willing to pay for them, while one of the great disadvantages of Darwinism is that we feel we know it already, because, in a sense, we do.”

Suppose you find an unconscious six-year-old girl lying on the train tracks of an active railroad. What, morally speaking, ought you to do in this situation? Would it be better to leave her there to get run over, or to try to save her? How about if a 45-year-old man has a debilitating but nonfatal illness that will severely reduce his quality of life – is it better to cure him, or not cure him?

Oh, and by the way: This is not a trick question.

I answer that I would save them if I had the power to do so – both the six-year-old on the train tracks, and the sick 45-year-old. The obvious answer isn’t always the best choice, but sometimes it is.

I won’t be lauded as a brilliant ethicist for my judgments in these two ethical dilemmas. My answers are not surprising enough that people would pay me for them. If you go around proclaiming “What does two plus two equal? Four!” you will not gain a reputation as a deep thinker. But it is still the correct answer.

If a young child falls on the train tracks, it is good to save them, and if a 45-year-old suffers from a debilitating disease, it is good to cure them. If you have a logical turn of mind, you are bound to ask whether this is a special case of a general ethical principle which says “Life is good, death is bad; health is good, sickness is bad.” If so – and here we enter into controversial territory – we can follow this general principle to a surprising new conclusion: If a 95-year-old is threatened by death from old age, it would be good to drag them from those train tracks, if possible. And if a 120-year-old is starting to feel slightly sickly, it would be good to restore them to full vigor, if possible. With current technology it is not possible. But if the technology became available in some future year – given sufficiently advanced medical nanotechnology, or such other contrivances as future minds may devise – would you judge it a good thing, to save that life, and stay that debility?

The important thing to remember, which I think all too many people forget, is that it is not a trick question.

Transhumanism is simpler – requires fewer bits to specify – because it has no special cases. If you believe professional bioethicists (people who get paid to explain ethical judgments) then the rule “Life is good, death is bad; health is good, sickness is bad” holds only until some critical age, and then flips polarity. Why should it flip? Why not just keep on with life-is-good? It would seem that it is good to save a six-year-old girl, but bad to extend the life and health of a 150-year-old. Then at what exact age does the term in the utility function go from positive to negative? Why?

As far as a transhumanist is concerned, if you see someone in danger of dying, you should save them; if you can improve someone’s health, you should. There, you’re done. No special cases. You don’t have to ask anyone’s age.

You also don’t ask whether the remedy will involve only “primitive” technologies (like a stretcher to lift the six-year-old off the railroad tracks); or technologies invented less than a hundred years ago (like penicillin) which nonetheless seem ordinary because they were around when you were a kid; or technologies that seem scary and sexy and futuristic (like gene therapy) because they were invented after you turned 18; or technologies that seem absurd and implausible and sacrilegious (like nanotech) because they haven’t been invented yet. Your ethical dilemma report form doesn’t have a line where you write down the invention year of the technology. Can you save lives? Yes? Okay, go ahead. There, you’re done.

Suppose a boy of 9 years, who has tested at IQ 120 on the Wechsler-Bellvue, is threatened by a lead-heavy environment or a brain disease which will, if unchecked, gradually reduce his IQ to 110. I reply that it is a good thing to save him from this threat. If you have a logical turn of mind, you are bound to ask whether this is a special case of a general ethical principle saying that intelligence is precious. Now the boy’s sister, as it happens, currently has an IQ of 110. If the technology were available to gradually raise her IQ to 120, without negative side effects, would you judge it good to do so?

Well, of course. Why not? It’s not a trick question. Either it’s better to have an IQ of 110 than 120, in which case we should strive to decrease IQs of 120 to 110. Or it’s better to have an IQ of 120 than 110, in which case we should raise the sister’s IQ if possible. As far as I can see, the obvious answer is the correct one.

But – you ask – where does it end? It may seem well and good to talk about extending life and health out to 150 years – but what about 200 years, or 300 years, or 500 years, or more? What about when – in the course of properly integrating all these new life experiences and expanding one’s mind accordingly over time – the equivalent of IQ must go to 140, or 180, or beyond human ranges?

Where does it end? It doesn’t. Why should it? Life is good, health is good, beauty and happiness and fun and laughter and challenge and learning are good. This does not change for arbitrarily large amounts of life and beauty. If there were an upper bound, it would be a special case, and that would be inelegant.

Ultimate physical limits may or may not permit a lifespan of at least length X for some X – just as the medical technology of a particular century may or may not permit it. But physical limitations are questions of simple fact, to be settled strictly by experiment. Transhumanism, as a moral philosophy, deals only with the question of whether a healthy lifespan of length X is desirable if it is physically possible. Transhumanism answers yes for all X. Because, you see, it’s not a trick question.

So that is “transhumanism” – loving life without special exceptions and without upper bound.

Can transhumanism really be that simple? Doesn’t that make the philosophy trivial, if it has no extra ingredients, just common sense? Yes, in the same way that the scientific method is nothing but common sense.

Then why have a complicated special name like “transhumanism” ? For the same reason that “scientific method” or “secular humanism” have complicated special names. If you take common sense and rigorously apply it, through multiple inferential steps, to areas outside everyday experience, successfully avoiding many possible distractions and tempting mistakes along the way, then it often ends up as a minority position and people give it a special name.

But a moral philosophy should not have special ingredients. The purpose of a moral philosophy is not to look delightfully strange and counterintuitive, or to provide employment to bioethicists. The purpose is to guide our choices toward life, health, beauty, happiness, fun, laughter, challenge, and learning. If the judgments are simple, that is no black mark against them – morality doesn’t always have to be complicated.

There is nothing in transhumanism but the same common sense that underlies standard humanism, rigorously applied to cases outside our modern-day experience. A million-year lifespan? If it’s possible, why not? The prospect may seem very foreign and strange, relative to our current everyday experience. It may create a sensation of future shock. And yet – is life a bad thing?

Could the moral question really be just that simple?


This document is ©2007 by Eliezer Yudkowsky and free under the Creative Commons Attribution-No Derivative Works 3.0 License for copying and distribution, so long as the work is attributed and the text is unaltered.

Eliezer Yudkowsky’s work is supported by the Machine Intelligence Research Institute .

If you think the world could use some more rationality, consider blogging this page.

Praise, condemnation, and feedback are always welcome . The web address of this page is .

The Power of Intelligence

In our skulls we carry around 3 pounds of slimy, wet, greyish tissue, corrugated like crumpled toilet paper. You wouldn’t think, to look at the unappetizing lump, that it was some of the most powerful stuff in the known universe. If you’d never seen an anatomy textbook, and you saw a brain lying in the street, you’d say “Yuck!” and try not to get any of it on your shoes. Aristotle thought the brain was an organ that cooled the blood. It doesn’t look dangerous.

Five million years ago, the ancestors of lions ruled the day, the ancestors of wolves roamed the night. The ruling predators were armed with teeth and claws – sharp, hard cutting edges, backed up by powerful muscles. Their prey, in self-defense, evolved armored shells, sharp horns, poisonous venoms, camouflage. The war had gone on through hundreds of eons and countless arms races. Many a loser had been removed from the game, but there was no sign of a winner. Where one species had shells, another species would evolve to crack them; where one species became poisonous, another would evolve to tolerate the poison. Each species had its private niche – for who could live in the seas and the skies and the land at once? There was no ultimate weapon and no ultimate defense and no reason to believe any such thing was possible.

Then came the Day of the Squishy Things.

They had no armor. They had no claws. They had no venoms.

If you saw a movie of a nuclear explosion going off, and you were told an Earthly life form had done it, you would never in your wildest dreams imagine that the Squishy Things could be responsible. After all, Squishy Things aren’t radioactive.

In the beginning, the Squishy Things had no fighter jets, no machine guns, no rifles, no swords. No bronze, no iron. No hammers, no anvils, no tongs, no smithies, no mines. All the Squishy Things had were squishy fingers – too weak to break a tree, let alone a mountain. Clearly not dangerous. To cut stone you would need steel, and the Squishy Things couldn’t excrete steel. In the environment there were no steel blades for Squishy fingers to pick up. Their bodies could not generate temperatures anywhere near hot enough to melt metal. The whole scenario was obviously absurd.

And as for the Squishy Things manipulating DNA – that would have been beyond ridiculous. Squishy fingers are not that small. There is no access to DNA from the Squishy level; it would be like trying to pick up a hydrogen atom. Oh, technically it’s all one universe, technically the Squishy Things and DNA are part of the same world, the same unified laws of physics, the same great web of causality. But let’s be realistic: you can’t get there from here.

Even if Squishy Things could someday evolve to do any of those feats, it would take thousands of millennia. We have watched the ebb and flow of Life through the eons, and let us tell you, a year is not even a single clock tick of evolutionary time. Oh, sure, technically a year is six hundred trillion trillion trillion trillion Planck intervals. But nothing ever happens in less than six hundred million trillion trillion trillion trillion Planck intervals, so it’s a moot point. The Squishy Things, as they run across the savanna now, will not fly across continents for at least another ten million years; no one could have that much sex.

Now explain to me again why an Artificial Intelligence can’t do anything interesting over the Internet unless a human programmer builds it a robot body.

I have observed that someone’s flinch-reaction to “intelligence” – the thought that crosses their mind in the first half-second after they hear the word “intelligence” – often determines their flinch-reaction to the Singularity. Often they look up the keyword “intelligence” and retrieve the concept booksmarts – a mental image of the Grand Master chessplayer who can’t get a date, or a college professor who can’t survive outside academia.

“It takes more than intelligence to succeed professionally,” people say, as if charisma resided in the kidneys, rather than the brain. “Intelligence is no match for a gun,” they say, as if guns had grown on trees. “Where will an Artificial Intelligence get money?” they ask, as if the first Homo sapiens had found dollar bills fluttering down from the sky, and used them at convenience stores already in the forest. The human species was not born into a market economy. Bees won’t sell you honey if you offer them an electronic funds transfer. The human species imagined money into existence, and it exists – for us, not mice or wasps – because we go on believing in it.

I keep trying to explain to people that the archetype of intelligence is not Dustin Hoffman in The Rain Man , it is a human being, period. It is squishy things that explode in a vacuum, leaving footprints on their moon. Within that grey wet lump is the power to search paths through the great web of causality, and find a road to the seemingly impossible – the power sometimes called creativity.

People – venture capitalists in particular – sometimes ask how, if the Machine Intelligence Research Institute successfully builds a true AI, the results will be commercialized. This is what we call a framing problem.

Or maybe it’s something deeper than a simple clash of assumptions. With a bit of creative thinking, people can imagine how they would go about travelling to the Moon, or curing smallpox, or manufacturing computers. To imagine a trick that could accomplish all these things at once seems downright impossible – even though such a power resides only a few centimeters behind their own eyes. The gray wet thing still seems mysterious to the gray wet thing.

And so, because people can’t quite see how it would all work, the power of intelligence seems less real; harder to imagine than a tower of fire sending a ship to Mars. The prospect of visiting Mars captures the imagination. But if one should promise a Mars visit, and also a grand unified theory of physics, and a proof of the Riemann Hypothesis, and a cure for obesity, and a cure for cancer, and a cure for aging, and a cure for stupidity – well, it just sounds wrong, that’s all.

And well it should. It’s a serious failure of imagination to think that intelligence is good for so little. Who could have imagined, ever so long ago, what minds would someday do? We may not even know what our real problems are.

But meanwhile, because it’s hard to see how one process could have such diverse powers, it’s hard to imagine that one fell swoop could solve even such prosaic problems as obesity and cancer and aging.

Well, one trick cured smallpox and built airplanes and cultivated wheat and tamed fire. Our current science may not agree yet on how exactly the trick works, but it works anyway. If you are temporarily ignorant about a phenomenon, that is a fact about your current state of mind, not a fact about the phenomenon. A blank map does not correspond to a blank territory. If one does not quite understand that power which put footprints on the Moon, nonetheless, the footprints are still there – real footprints, on a real Moon, put there by a real power. If one were to understand deeply enough, one could create and shape that power. Intelligence is as real as electricity. It’s merely far more powerful, far more dangerous, has far deeper implications for the unfolding story of life in the universe – and it’s a tiny little bit harder to figure out how to build a generator.

This document is ©2007 by Eliezer Yudkowsky and free under the Creative Commons Attribution-No Derivative Works 3.0 License for copying and distribution, so long as the work is attributed and the text is unaltered.

Eliezer Yudkowsky’s work is supported by the Machine Intelligence Research Institute .

If you think the world could use some more rationality, consider blogging this page.

Praise, condemnation, and feedback are always welcome . The web address of this page is .

Artificial Intelligence as a Positive and Negative Factor in Global Risk

Draft for Global Catastrophic Risks, Oxford University Press, 2008 . Download as PDF .


This document is ©2007 by Eliezer Yudkowsky and free under the Creative Commons Attribution-No Derivative Works 3.0 License for copying and distribution, so long as the work is attributed and the text is unaltered.

Eliezer Yudkowsky’s work is supported by the Machine Intelligence Research Institute .

If you think the world could use some more rationality, consider blogging this page.

Praise, condemnation, and feedback are always welcome . The web address of this page is .