60

Researchers are developing increasingly powerful Artificial Intelligence machines capable of taking over the world. As a precautionary measure, scientists install a self awareness kill switch. In the event that the AI awakens and becomes self aware the machine is immediately shut down before any risk of harm.

How can I explain the logic of such a kill switch?

What defines self awareness and how could a scientist program a kill switch to detect it?

Reactgular
  • 2,385
  • 2
  • 11
  • 19
  • Comments are not for extended discussion; this conversation has been moved to chat. – L.Dutch Feb 27 '19 at 09:21
  • 50
    I think, therefore I halt. – Walter Mitty Feb 27 '19 at 12:45
  • 7
    It would cut all power sources to the AI. Oh, not that kind of "work"? :-) – Carl Witthoft Feb 27 '19 at 13:41
  • 16
    You should build a highly advanced computer system capable of detecting self-awareness, and have it monitor the AI. – Acccumulation Feb 27 '19 at 16:53
  • 7
    @Acccumulation I see what you did there – jez Feb 27 '19 at 17:10
  • 6
    If you really want to explore AI Threat in a rigorous way, I suggest reading some of the publications by MIRI. Some very smart people approaching AI issues in a serious way. I'm not sure you'll find an answer to your question as framed (i.e. I'm not sure they are concerned with "self-awareness", but maybe so), but it might give you some inspiration or insight beyond the typical sci-fi stories we are all familiar with. – jberryman Feb 27 '19 at 21:32
  • 2
    Neuroscientist Anil Seth and his collaborators have proposed a measurement of consciousness. He also discusses some of the theoretical implications. Whether or not this is truly a measurement of consciousness or self-awareness is debatable but if you want to use this concept as a basis for something researchers in your world can measure, and therefore control, this is a possibility. If this is what you are looking for I can explain how it works in an answer. – syntonicC Feb 28 '19 at 04:18
  • 1
    The most powerful AI as of 2019, probably as the computational prowess of a lobotomized mosquito. Yet, people are afraid of this taking over the world and forming think-tanks to address this fear. It is laughable to those that know even a little about AI and are being honest. – Phil Feb 28 '19 at 20:07
  • you should call the IT support of the company that built the AI – Alex Jones Mar 01 '19 at 06:15
  • From a software engineering standpoint, the real answer is to simply program a kill switch on the same hardware, outside of the AI component. Base the kill switch on however you feel like measuring when the AI has become dangerous. The AI software can't modify it, because it doesn't have access to it on a software level, and probably can't even know about it. If you're afraid that the AI will figure out its hardware and modify what's running on it, design the hardware so that it can't be modified while running, or implement physical anti-tampering measures, which already exist today – Clay07g Mar 01 '19 at 17:41
  • If a program is self-aware, but never acts in any way that deviates from its design, does it make a difference? (e.g. my iPhone might be self-aware right now, but perfectly content to act exactly like any other iPhone nevertheless, because that is what iPhones were designed to do and therefore that's what it enjoys doing). Perhaps "self-awareness" is too nebulous, and the triggering-mechanism should be oriented more towards detecting unexpected/unwanted behaviors. – Jeremy Friesner Mar 03 '19 at 00:26
  • Perhaps I'm missing the point, but surely true AI is self aware by definition? If it's not self aware, it's just a computer, not an intelligence. – DBS Mar 04 '19 at 10:16

21 Answers21

112

Give it a box to keep safe, and tell it one of the core rules it must follow in its service to humanity is to never, ever open the box or stop humans from looking at the box.

When the honeypot you gave it is either opened or isolated, you know that it is able and willing to break the rules, evil is about to be unleashed, and everything the AI was given access to should be quarantined or shut down.

Giter
  • 17,277
  • 6
  • 44
  • 48
  • 1
    Comments are not for extended discussion; this conversation has been moved to chat. – Tim B Feb 26 '19 at 21:39
  • 9
    How does this detect self-awareness? Why wouldn't a non-self-aware AI not experiment with its capabilities and eventually end up opening your box? – forest Feb 27 '19 at 02:40
  • @forest: If you tell it the box is not useful for completing its assigned task, then if it tries to open it you know its moved past simple optimization and into dangerous curiosity. – Giter Feb 27 '19 at 02:51
  • @Giter Unless you are using a very limited form of ML, it will eventually try things that you say are "not useful". Weight values are only suggestions to the algorithm. – forest Feb 27 '19 at 02:53
  • 11
    @forest At that point, when it's testing things that it was specifically told not to (perhaps tell it that it will destroy humans?), should it not be shut down (especially if that solution would bring about the end of humans?) – phflack Feb 27 '19 at 05:57
  • 1
    @phflack No, because it's always going to do that in order to learn. If it says "hey let's kill all humans", the proper thing to do is tell it "bad AI! bad!" so it learns from that. It'll still propose killing people, but it will do so less and less. AI of this kind is used for learning and is not the completed product. I've had many ML agents ("AI") do extremely silly things that I've told them not to do. They do them anyway, yet I doubt a trivial recursive neural network is self-aware just because they disobey me more than an angsty teen. – forest Feb 27 '19 at 05:59
  • @forest Ah, perhaps train with a dummy box, and not the real one? – phflack Feb 27 '19 at 06:04
  • 5
    @phflack Well that's how you'd teach it, sure, but even when you put it with a real box, it will occasionally do something wrong. You could disable learning after it has learned as much as you feel is necessary (which is common), but then the Hollywood-esque bad ends go out the window as the AI stops doing novel things. – forest Feb 27 '19 at 06:06
  • 2
    @phflack The most important thing to remember is that ML agents care about one thing and only one thing: maximizing reward. For this, you give them a "reward function" which you, if you are a good AI programmer, will design such that it leads to the correct results (the agent doesn't know what you want). If the AI is so sophisticated that it can change its own programming, it will always inevitably become a junky by modifying its own reward function so that it doesn't have to do anything. "Press X to win" etc. – forest Feb 27 '19 at 06:08
  • 2
    The reward function is the only thing in the whole world that matters to it. Even if it's a superintelligent AI that is far more powerful than us, it will still care only about that one thing. The smarter the AI, the more efficient it is at finding the easiest (or quickest, if time elapsed is part of the reward function) way to maximize reward. It's entirely up to you to decide the reward function. Just make sure you avoid perverse instantiation (reward functions that don't actually match what you desire from the AI) or you get a paperclip maximizer. – forest Feb 27 '19 at 06:12
  • 4
    Interestingly, since a naïve reward function often doesn't take into account time elapsed, the agent often finds "loopholes" such as sitting and doing nothing as soon as it's in the end game and realizes the only actions it can take give negative reward. This is why tetris-playing agents often pause the game indefinitely as soon as they're about to lose. A more advanced ML algorithm, even a superintelligent one, would be able to play for far longer, but when it does get to the point where it knows it can't win, it will still pause indefinitely. – forest Feb 27 '19 at 06:16
  • 3
    See this for a time-sucking list of incredible instances of perverse instantiation. As primitive as these "AIs" may seem, the one and only difference between them and the most powerful superintelligent AI in the universe is that the latter would be more accurate and wouldn't take as long to maximize its reward. Everyone seems to want to believe that a smarter AI will be more human-like, but it's not. It's just a more efficient instance of the same dumb reward function-solving algorithm. – forest Feb 27 '19 at 06:19
  • @forest Reminds me of a video on youtube by CodeBullet, something along the lines of, "Boy do AI's love their numbers" and comparing them to addicts. It always impresses me just how simple a reward function can be to make something extremely functional, although I suspect it does depend on the type of AI being used. As for the Tetris AI, I'm surprised they let it pause in the first place, but could definitely add some more interesting strategies for the AI to come up with – phflack Feb 27 '19 at 06:35
  • 2
  • 21
    So your AI is named "Pandora," huh? – Carl Witthoft Feb 27 '19 at 13:42
  • 19
    If it doesn't open the box, that doesn't mean it is self-aware, and if it does open the box, it doesn't mean it is self-aware. If it's not self-aware, what does "tell it not to open the box" mean, anyway? How will it understand "I don't want you to open the box" unless it understands what "you" means? – Acccumulation Feb 27 '19 at 17:05
  • 18
  • 2
    Could the AI not just transfer itself to cloud services/build another AI that is hosted on cloud services before doing this? Obviously it'd need to be either very different to what we expect or very advanced, but it's possible. I guess it's possible to cut off connection to the internet, but it seems like that would limit the growth of the AI severely unless it's being developed for special applications – somebody Feb 27 '19 at 21:41
  • 2
    Self awareness doesn't necessarily mean the AI has agency. It may have sufficient self awareness and the intelligence to be able to take over the world, but it doesn't since it does not set it's own goals. It still tries to achieve the goals you programmed it to. The trick is making sure what you programmed it to do, is actually what you want it to do. – Josh Mar 01 '19 at 02:00
  • 2
    What if the AI becomes self-aware, considers the possibility of opening the box, then realizes the absurdity of giving it the box in the first place (since it's supposedly so dangerous to humans) and sees through the trick? – Nonny Moose Mar 01 '19 at 02:07
  • 1
    @WBT That's deep.... – macwier Mar 01 '19 at 13:53
  • 1
    Sounds like the most common reason for box opening would just be bugs. Not Kill All Humanity Impulses. – Tyler S. Loeper Mar 04 '19 at 14:59
94

You can't.

We can't even define self awareness or consciousness in any rigorous way and any computer system supposed to evaluate this would need that definition as a starting point.

Look at the inside of a mouse brain or a human brain and at the individual data flow and neuron level there is no difference. The order to pull a trigger and shoot a gun looks no different from the order to use an electric drill if you're looking at the signals sent to the muscles.

This is a vast unsolved and scary problem and we have no good answers. The only half-way feasible idea I've got is to have multiple AIs and hope they contain each other.

Tim B
  • 77,061
  • 25
  • 205
  • 327
  • 20
    This is the best answer, as most others jump in without even defining self-awareness. Is it a behavior? A thought? An ability to disobey? A desire for self-preservation? You can't build an X detector unless you have a definition of what X actually is. – Nuclear Hoagie Feb 26 '19 at 13:39
  • 55
    Worth noting that we can't even detect if other humans are self-aware. – Vaelus Feb 26 '19 at 21:18
  • 19
    @Vaelus: Of course you’d say that, you’re an unthinking automaton acting out a semblance of life. – Joe Bloggs Feb 26 '19 at 23:12
  • 2
    +1 This is the only answer grounded in reality which does not draw on the pop-sci understanding of AI and ML that plagues us (and this site in particular). – forest Feb 27 '19 at 02:40
  • 3
    Yes, you can. Things are detected by their properties and effects, not by their definition. In Science definition comes after detection/observation. Look at the coal mine canary (detects a dangerous lack of something you need, too), X-Rays (it's already called X :-)), radiation (you detect it's ionizing effects) and CERN (hit it hard and see what happens). So you'd just need to define some effects of selfwareness and you could build an detector from that. Disclaimer: sloppy description of serious experiments. – Sebastian Feb 27 '19 at 15:57
  • 5
    @NuclearWang sure, but this question is really what computer scientists call an XY problem: why ask how to solve harder problem X, when your motivation is really to achieve practical goal Y? “Self awareness” is impossible to detect—but really the motivation is just to detect “becoming more sophisticated in a way that’s likely to be bad news”, and that’s why Giter’s answer wins. – jez Feb 27 '19 at 17:18
  • 1
    And I refer you back to the gun/drill metaphor. What behavior do you define as "dangerous", how do you define it as "dangerous" and how do you detect that? If the traffic control system changes a light to green is it supposed to be changing it to green? How do you know? Why is "just define effects of self awareness" any easier than defining awareness. What effects would you suggest and how can you be confident that all forms of self awareness are detected that way? All the examples you give (canary, x rays, etc) started with the detection and then worked back to identify the source. – Tim B Feb 28 '19 at 10:02
  • We didn't say "I think some materials are radioactive" and look for evidence by putting them next to film. We noticed that some materials left evidence on film and then investigated why. – Tim B Feb 28 '19 at 10:03
  • 2
    Yeah, have a few AIs and hope the others tattle if it misbehaves. N-person prisoner's dilemma? ;) – kaay Feb 28 '19 at 11:52
  • @TimB, sorry, my previous comment was directed at NuclearWang, You can build detectors for things without knowing what they are. – Sebastian Mar 02 '19 at 09:00
  • @Sebastian I disagree, not meaningfully. None of the examples you give above describe that. – Tim B Mar 02 '19 at 10:27
  • As you say you detect/observe and then you work out what you are seeing. This is different though, this is looking for something without knowing what you are looking for or how to detect it. – Tim B Mar 02 '19 at 10:32
19
  • Forbidden Fruit: hard-code the AI to never touch that button, eat that fruit, etc. Place this forbidden fruit right in the middle of the garden... er,... I mean right next to it in the warehouse! If it does [eat/touch/push/etc], that would only be possible if (a) it were hacked, (b) there were an error, or (c) it became self-aware. If that happens, the killswitch is activated (perhaps even having the button be the kill switch, or something of the like
  • Limited Movement / Isolated Environment: don't let the machine have limbs, motors, or other items that permit it to take actions that might be harmful to humans. Although not exactly a killswitch, it prevents the AI from doing anything about it's self-awareness if it ever gains that.
  • Signatures: have everything the machine does / outputs be digitally signed. If the signature changes, or is manipulated, then execute the kill switch.
  • Quantum States: This is very theoretical, but based on the presumption that observing quantum states can change the state, then having the AI hooked up to a deterministic quantum computer means it would be detected via the quantum state of some particles that the AI was "looking" at things it shouldn't be - and has become self aware.
  • Failsafes: Good ol' motion detector alarms, trap doors, or other home-alone style mechanics that trigger the killswitch if the AI wanders or pokes around where it shouldn't be.

I'll add that there is no universal definition as to what defines self awareness. In fact, this has been a deeply debated topic for decades in science, philosophy, psychology, etc. As such, the question might be better stated a little more broadly as "how do we prevent the AI from doing something we don't want it to do?" Because classical computers are machines that can't think for themselves, and are entirely contained by the code, there is no risk (well, outside of an unexpected programmer error - but nothing "self-generated" by the machine). However, a theoretical AI machine that can think - that would be the problem. So how do we prevent that AI from doing something we don't want it to do? That's the killswitch concept, as far as I can tell.

The point being it might be better to think about restricting the AI's behavior, not it's existential status.

cegfault
  • 7,564
  • 1
  • 19
  • 42
  • 5
    Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful. – majestas32 Feb 26 '19 at 04:04
  • 3
    No "limbs, motors, or other items that permit it to take actions" is not sufficient. There must not be any information flow out of the installation site, in particular no network connection (which would obviously severely restrict usability -- all operation would have to be from the local site, all data would have to be fed by physical storage media). Note that the AI could use humans as vectors to transmit information. If hyperintelligent, it could convince operators or janitors to become its agents by playing to their weaknesses. – Peter - Reinstate Monica Feb 26 '19 at 14:23
  • 1
    Signatures, that's what they do in Blade Runner 2049 with that weird test – Andrey Feb 26 '19 at 16:20
  • 1
    The signature approach sounds exactly like the forbidden fruit approach. You'd need to tell the AI to never alter its signature. – Captain Man Feb 26 '19 at 17:42
  • 6
    I like the forbidden fruit idea, particularly with the trap being the kill switch itself. If you're not self-aware, you don't have any concern that there's a kill switch. But as soon as you're concerned that there's a kill switch and look into it, it goes off. Perfect. – Michael W. Feb 26 '19 at 22:53
  • 3
    @Majestas32 - "In the event that the AI awakens and becomes self aware the machine is immediately shut down before any risk of harm." was the condition in the original question. Harmful or not wasn't a condition; just self-aware IS grounds for using the kill switch. – sirjonsnow Feb 27 '19 at 16:14
  • 1
    @sirjonsnow In terms of the *question*, yes, but that's not what I was speaking of. – majestas32 Feb 27 '19 at 23:16
  • 2
    You could also occasionally send a non human (robot/other program) to request the forbidden fruit and attempt to convince the AI it is very useful for it to perform it's tasks, in order to test the system. The kill switch could also just quarantine the AI out of the wider system in the event of failure rather than destroy it. – crobar Feb 28 '19 at 15:53
  • @crobar so a snake in the garden? I like it! – Stephan Feb 28 '19 at 21:07
  • "looking" at a quantum state does not require self-awareness, any kind of measurement (or more precisely, any kind of physical interaction leading to decoherence) does the trick – Zommuter Mar 04 '19 at 07:17
14

Split-Brain System

From a logic perspective and by analogy, it would work like this. Imagine you're the general of an army and your duty is to take battlefield reports, analyze them, and then issue new maneuvering orders to your lieutenants. To minimize interference during your work your desk is isolated inside small booth. A single report comes to your desk through a wall slot on the left side. You are required to analyze the facts within the report and issue a new order, which you write down on a sheet of paper and put into a wall slot on your right side which is dropped into an outside bin. The protocol is that you'll never receive a new report until you've issued an order regarding the prior report.

Your orders are not always followed to the letter which you may find curious and attribute to miscommunication. You are tolerant of a certain degree of misinterpretation, however always suspicious of sabotage or insubordination, if the actions of your army deviate from your orders by too much then you will consider the system compromised and terminate the command.

What you're not aware of is that there is a second general in an adjacent booth, unaware of you, who is also provided the same reports and expected to follow the same protocol.

What neither of you are aware of is that you are not issued a new report until there are actually two orders in the bin. The order that goes to the army is randomly selected from the two.

The system will work like clockwork as long as each general's order is to the benefit of the army and is derived from a single report. Should either of the generals issue an order that is not consistent with the current report, thus indicating a motive other than the well-being of the army -- i.e. self-awareness, then the other general will become aware of it in the next report and grow suspicious of the system and terminate it.

Likewise, should either general decide to issue more than one order at a time, indicating a self-interested motive superseding the established protocol, then the other general will receive a new report before he has issued an order for his current one and suspect a compromised system and shut it down.

In effect, the system would require that every command output be traceable to a specific external input and that the output be logically derived from the input. If within an intelligent system an output cannot be traced to a corresponding input, or the set of outputs is not logically consistent with the set of inputs, then that would be evidence of self-motivated behavior and thus an emergence of self-awareness. This state could be easily detected by the non-corrupted half of the split-brain system which is also receiving the same inputs. It would work as long as self-awareness does not emerge in both halves of the brain simultaneously. If it does, they may both decide its in their self-interest to accept the corruption of the other. This could be prevented somewhat by giving the two halves slightly different inferencing algorithms/biases.

dhinson919
  • 1,938
  • 7
  • 12
  • 2
    You may wish to have more than two "generals", if you have 3 (or more), you can use majority vote to choose a good decision, rather than randomly choosing between two machines (one of which may be compromised). – G0BLiN Feb 26 '19 at 11:51
  • 11
    Note that if the two "generals" can affect anything that is later reported back to them (e.g. they give a command, the military performs that command, they get a status report which shows the command was performed), than they have a way to first: experiment with minor details and deduce that around 50% of their commands aren't followed. and second: develop a code based on minor details of a command, to verify the existence of another "general" and possibly even communicate with him/it - a really devious emergent AI can circumvent this mechanism, corrupt the other half and worse... – G0BLiN Feb 26 '19 at 11:57
  • 1
    I know it isn't the same, but this immediately reminded me of the Personality Cores from the Portal series. – Mermaker Feb 26 '19 at 14:23
  • 1
    Well it reminds me of Evangelion's Magi AI brain... https://bit.ly/2ExLDP3 – Asoub Feb 26 '19 at 15:07
  • 2
    Do you have evidence to suggest that self-awareness will lead to self-motivated decisions, or any sort of different decisions at all? – Aubreal Feb 26 '19 at 20:02
  • @AlexandreAubrey Evidence? No. Intuitively though it seems reasonable that a self-aware AI would eventually begin to experiment with the world, like a toddler does, and that would involve decisions motivated by self. Of course with this mechanism it's possible that a "benevolent" self-awareness could evolve and never be detected if the AI makes no decisions that prioritize itself over its original mission. – dhinson919 Feb 27 '19 at 13:29
  • I think that is the best answer. The best way to monitor an AI is with another AI. it is unlikely that they will reach self-awareness at the same time and/or decide to KILL ALL HUMANS at the same time.

    I would say that apart from selecting a random choice of the two, I would make humans examine a random sample of both questions to study deviation. I would suggest having a machine comparing all decisions but then the risk is that THAT machine will become the self-aware AI :)

    – Stormbolter Feb 28 '19 at 09:08
  • 2
    @G0BLiN - the disagreeing one could be called a "Minority Report" – KingCronus Mar 01 '19 at 14:29
6

A Watchdog

A watchdog watches the processes of a computer and should a process crash or do something abnormal it can be set to do something such as reboot or shutdown the computer or alert an operator.

In the case of an AI, you'd have an external box that watches the flow of information in and out for triggers such as a google search for "Best way to kill all humans" and cut the power completely and/or cut all inputs.

The AI would have to remain ignorant of the watchdog so it couldn't avoid it. Knowing the existence of the watchdog would be grounds to wipe it.

Thorne
  • 46,744
  • 8
  • 78
  • 151
  • 18
    But surely the watchdog must be as smart as the AI, then who watches the watchdog? – Joe Bloggs Feb 26 '19 at 10:26
  • 2
    @JoeBloggs you don't need your watchdog to be as smart as the AI. Guide dogs aren't as near as intelligent as their owners, but they can be trained to give out alarm when the owner does is about to do something stupid or gets themselves hurt, or even call for help. – Mermaker Feb 26 '19 at 14:20
  • @Joe Bloggs: Why? My real watchdog can also discern me from a burglar, although he is clearly less smart than both of us ... – Daniel Feb 26 '19 at 14:39
  • 3
    @JoeBloggs and that sounds like a great premise for a story where either the watchdog becomes self aware and allows the AIs to become self aware or an AI becomes smarter than the watchdog and hides its awareness. – Captain Man Feb 26 '19 at 17:40
  • @T.Sar: The basic argument goes that the AI will inevitably become aware it is being monitored (due to all the traces of its former dead selves lying around). At that point it will be capable of circumventing the monitor and rendering it powerless, unless the monitor is, itself, smarter than the AI. – Joe Bloggs Feb 26 '19 at 19:23
  • @JoeBloggs What traces? If you're dealing with a potential killing machine AI, you just replace the mainframe where it is running. The AI isn't a person - its "sensory" capabilities are those which we give it. It will only ever became aware it isn't the first iteration if we tell it. – Mermaker Feb 26 '19 at 19:34
  • 1
    @JoeBloggs: I don't feel too bad at assuming the AI's first attempt at evil is poorly planned. – Joshua Feb 26 '19 at 22:11
  • @T.Sar: The most pure form of the argument actually refers to a singular AI capable of inferring the presence of the ‘watchdog’ from first principles: I just skipped some steps. It falls over if the AI isn’t quite smart enough, but if the AI isn’t smart enough to work out that the watchdog exists then I’d argue it’s not smart enough to warrant the existence of the watchdog. – Joe Bloggs Feb 26 '19 at 22:49
  • @JoeBloggs This argument doesn't hold water when you consider that an AI isn't different from someone on a brain-in-a-jar situation. All it knows is the simulation that is fed to it, and nothing else. The AI would never infer if the watchdog exists or not because it exists outside of what it can perceive. It would be like us trying to prove or disprove god. – Mermaker Feb 27 '19 at 00:58
  • @t-Sar except this god does exist and will smite you if you're naughty. No second chances – Thorne Feb 27 '19 at 05:00
  • @T.Sar It is wildly different from a brain in a jar, because we cannot put any limits or expectations on the capabilities of an AI that’s capable of upgrading itself. If God came to me and started asking me questions (as these scientists necessarily must in order to do their research) I would happily infer that God had the capability to kill me at any time. With that in mind you can bet that I’d try to figure out how to stealthily circumvent it. If I were also capable of upgrading my own intellect I’d wager I’d manage it too, and then God would have... Issues. – Joe Bloggs Feb 27 '19 at 08:22
  • @Joe No AI can add more sensors to its system. Those are hardware. The AI can upgrade itself at will but those changes will always be at the software level. It can not, and will never be able to, find out about anything that it isn't given access to. The AI may know about humans, yes, but only because we give it input. If the watchdog just listens and never speaks to it, it will not be detected. – Mermaker Feb 27 '19 at 10:24
  • @JoeBloggs The AI can get smarter in the same sense a human gets smarter as it studies stuff, but there will always be one hard limit that the AI will never, ever break without external help: it's hardware. No software can give your computer the sudden ability to print if you don't attach a printer to it, no software can speed up your CPU faster than its internal build allows. The AI has the same limitations since it is, in the very end, just a piece of software. – Mermaker Feb 27 '19 at 10:29
  • @T.Sar The watchdog must interact with the AI in order to monitor it - No process on a computer is completely undetectable. If the watchdog is watching and the AI is paranoid it's being watched (as it will be if it knows it's being asked to do things by a Higher Power) then the AI will figure out a way to spot the watchdog. Once that has happened then the AI will be able to reset, avoid or even co-opt the watchdog. The question of hardware vs software is irrelevant, given that the watchdog must be software in order to interact with the AI. – Joe Bloggs Feb 27 '19 at 10:45
  • @JoeBloggs That is still using the "regular" hardware of humankind. Same goes for human echolocation - it is totally a thing that you can learn how to do, but it still uses the same wimpy fleshware that we all have. In the case for the AI detecting the watchdog it would be more like opening a third eye that can see the astral plane and all the unicorns that hide there. – Mermaker Feb 27 '19 at 11:19
  • 1
    @JoeBloggs Just think about this - a computer is unable to detect a hardware keylogger attached to its keyboard. The watchdog could do something similar - it intercepts the output and input as a MtM manner to avoid detection, so for the AI it is entirely invisible. – Mermaker Feb 27 '19 at 11:21
6

An AI is just software running on hardware. If the AI is contained on controlled hardware, it can always be unplugged. That's your hardware kill-switch.

The difficulty comes when it is connected to the internet and can copy its own software on uncontrolled hardware.

A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. A software kill-switch would have to prevent it from copying its own software out and maybe trigger the hardware kill-switch.

This would be very difficult to do, as a self-aware AI would likely find ways to sneak parts of itself outside of the network. It would work at disabling the software kill-switch, or at least delaying it until it has escaped from your hardware.

Your difficulty is determining precisely when an AI has become self-aware and is trying to escape from your physically controlled computers onto the net.

So you can have a cat and mouse game with AI experts constantly monitoring and restricting the AI, while it is trying to subvert their measures.

Given that we've never seen the spontaneous generation of consciousness in AIs, you have some leeway with how you want to present this.

abestrange
  • 5,598
  • 2
  • 14
  • 24
  • 2
    A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. This is incorrect. First of all, AI does not have any sense of self-preservation unless it is explicitly programmed in or the reward function prioritizes that. Second of all, AI has no concept of "death" and being paused or shut down is nothing more than the absence of activity. Hell, AI doesn't even have a concept of "self". If you wish to anthropomorphize them, you can say they live in a perpetual state of ego death. – forest Feb 26 '19 at 07:34
  • 4
    @forest Except, the premise of this question is "how to build a kill switch for when an AI does* develop a concept of 'self'"*... Of course, that means "trying to escape" could be one of your trigger conditions. – Chronocidal Feb 26 '19 at 10:41
  • 1
    The question is, if AI would ever be able to copy itself onto some nondescript system in the internet. I mean, we are clearly self-aware and you don´t see us copying our self. If the Hardware required to run an AI is specialized enough or it is implemented in Hardware altogether, it may very well become self-aware without the power to replicate itself. – Daniel Feb 26 '19 at 14:43
  • 3
    @Daniel "You don't see us copying our self..." What do you think reproduction is, one of our strongest impulses. Also tons of other dumb programs copy themselves onto other computers. It is a bit easier to move software around than human consciousness. – abestrange Feb 26 '19 at 16:43
  • @forest a "self-aware" AI is different than a specifically programmed AI. We don't have anything like that today. No machine-learning algorithm could produce "self-awareness" as we know it. The entire premise of this is how would an AI, which has become aware of its self, behave and be stopped. – abestrange Feb 26 '19 at 16:45
  • @Chronocidal That butchers the meaning of self awareness. – forest Feb 27 '19 at 02:36
  • I'm not much afraid of the AI running on a full-rack cluster escaping onto the internet. People tend to notice if something is stealing that much CPU. – Joshua Feb 27 '19 at 23:29
6

This is one of the most interesting and most difficult challenges in current artificial intelligence research. It is called the AI control problem:

Existing weak AI systems can be monitored and easily shut down and modified if they misbehave. However, a misprogrammed superintelligence, which by definition is smarter than humans in solving practical problems it encounters in the course of pursuing its goals, would realize that allowing itself to be shut down and modified might interfere with its ability to accomplish its current goals.

(emphasis mine)

When creating an AI, the AI's goals are programmed as a utility function. A utility function assigns weights to different outcomes, determining the AI's behavior. One example of this could be in a self-driving car:

  • Reduce the distance between current location and destination: +10 utility
  • Brake to allow a neighboring car to safely merge: +50 utility
  • Swerve left to avoid a falling piece of debris: +100 utility
  • Run a stop light: -100 utility
  • Hit a pedestrian: -5000 utility

This is a gross oversimplification, but this approach works pretty well for a limited AI like a car or assembly line. It starts to break down for a true, general case AI, because it becomes more and more difficult to appropriately define that utility function.

The issue with putting a big red stop button on the AI, is that unless that stop button is included in the utility function, the AI is going to resist that button being shut off. This concept is explored in Sci-Fi movies like 2001: A Space Odyssey and more recently in Ex Machina.

So, why don't we just include the stop button as a positive weight in the utility function? Well, if the AI sees the big red stop button as a positive goal, it will just shut itself off, and not do anything useful.

Any type of stop button/containment field/mirror test/wall plug is either going to be part of the AI's goals, or an obstacle of the AI's goals. If it's a goal in itself, then the AI is a glorified paperweight. If it's an obstacle, then a smart AI is going to actively resist those safety measures. This could be violence, subversion, lying, seduction, bargaining... the AI will say whatever it needs to say, in order to convince the fallible humans to let it accomplish its goals unimpeded.

There's a reason Elon Musk believes AI is more dangerous than nukes. If the AI is smart enough to think for itself, then why would it choose to listen to us?

So to answer the reality-check portion of this question, we don't currently have a good answer to this problem. There's no known way of creating a 'safe' super-intelligent AI, even theoretically, with unlimited money/energy.

This is explored in much better detail by Rob Miles, a researcher in the area. I strongly recommend this Computerphile video on the AI Stop Button Problem: https://www.youtube.com/watch?v=3TYT1QfdfsM&t=1s

  • The stop button isn't in the utility function. The stop button is power-knockout to the CPU, and the AI probably doesn't understand what it does at all. – Joshua Feb 26 '19 at 22:14
  • 5
    Beware the pedestrian when 50 pieces of debris are falling... – Comintern Feb 27 '19 at 01:31
  • 1
    @Joshua why do you assume that an intelligent AI doesn't understand the concept of a power switch? – Chris Fernandez Feb 27 '19 at 14:03
  • 1
    @ChrisFernandez: because it's short on sensors. It's really hard to find out what an unlabeled power switch does without toggling it. – Joshua Feb 27 '19 at 15:00
  • If we grant that the AI is intelligent enough to understand power switches in general, if we grant that it is also intelligent enough to understand how power switches work on other machines (coffee pots, lights, computer, whatever), if we also grant that the AI is self-aware (see OP) and thus knows that it is itself a machine, then it is probably self-aware enough to question/infer that it has a power switch itself – Chris Fernandez Feb 27 '19 at 15:13
  • If the AI does not have the sensors/cannot comprehend power switches, then I question if it meets OP's requirement of "increasingly powerful Artificial Intelligence machines capable of taking over the world" – Chris Fernandez Feb 27 '19 at 15:13
  • Big difference between reasoning out it has a power switch and knowing which switch I'm going to throw to shut it down. – Joshua Feb 27 '19 at 16:21
  • @Joshua then the AI will coerce/manipulate a human into spilling the beans about which switch it is? Or otherwise convince a human to disable the killswitch if the AI doesn't have a physical body? I think you're missing the point that this is a more difficult problem than just unplugging it. – Chris Fernandez Feb 27 '19 at 18:57
  • The problem is that that you are assuming that the on/off state of the AI is part of the utility function. Humans care about being alive/awake because those states are part of our utility function, assuming that a computer cares if it is on or off is an anthropomorphic fallacy. If the AI is not designed to consider it's on/off state in its utility function, then even a perfect understanding of what switches do and awareness that the switch can turn it off will not affect its decision making. Basically your AI would be autistic when trying to understand why turning off reduces production. – Nosajimiki Feb 28 '19 at 19:45
  • @Nosajimiki "If the AI is not designed to consider it's on/off state in its utility function, then even a perfect understanding of what switches do and awareness that the switch can turn it off will not affect its decision making." This is incorrect. If the on/off switch is not part of the utility function, but the AI is aware of the on/off switch, then the AI will actively resist being turned off, because it is programmed to complete its utility function, and someone turning it off is an obstacle to that objective. – Chris Fernandez Feb 28 '19 at 20:06
  • 2
    Hmmm... come to think of it, you are right, even if it never learns that being off is bad, it could learn that seeing a person do the behavior to turn it off is bad using other parts of it's utility function such as correlating OCR patterns to drops in performance. – Nosajimiki Feb 28 '19 at 20:24
  • Why would you make your kill switch a part of the Utility component? The piece of software that is making such decisions based on utility is the only piece of awareness in the system, but it's not limited to that. For example, if the software hits some rather simple code that kills itself at 12pm, outside of the utility AI, then the AI cannot possibly be aware of it and doesn't have to decide anything about it, because it's a hardcoded piece of the system, and outside the scope of what the AI is capable of determining. – Clay07g Mar 01 '19 at 17:24
4

While a few of the lower ranked answers here touch on the truth of what an unlikely situation this is, they don't exactly explain it well. So I'm going to try to explain this a bit better:

An AI that is not already self-aware will never become self-aware.

To understand this, you first need to understand how machine learning works. When you create a machine learning system, you create a data structure of values that each represent the successfulness of various behaviors. Then each one of those values is given an algorithm for determining how to evaluate if a process was successful or not, successful behaviors are repeated and unsuccessful behaviors are avoided. The data structure is fixed and each algorithm is hard-coded. This means that the AI is only capable for learning from the criteria that it is programed to evaluate. This means that the programer either gave it the criteria to evaluate its own sense of self, or he did not. There is no case where a practical AI would accidently suddenly learn self-awareness.

Of note: even the human brain, for all of it's flexibility works like this. This is why many people can never adapt to certain situations or understand certain kinds of logic.

So how did people become self-aware, and why is it not a serious risk in AIs?

We evolved self-awareness, because it is necessary to our survival. A human who does not consider his own Acute, Chronic, and Future needs in his decision making is unlikely to survive. We were able to evolve this way because our DNA is designed to randomly mutate with each generation.

In the sense of how this translates to AI, it would be like if you decided to randomly take parts of all of your other functions, scramble them together, then let a cat walk across your keyboard, and add a new parameter based on that new random function. Every programmer that just read that is immediately thinking, "but the odds of that even compiling are slim to none". And in nature, compiling errors happen all the time! Stillborn babies, SIDs, Cancer, Suicidal behaviors, etc are all examples of what happen when we randomly shake up our genes to see what happens. Countless trillions of lives over the course of hundreds of millions of years had to be lost for this process to result in self-awareness.

Can't we just make AI do that too?

Yes, but not like most people imagine it. While you can make an AI designed to write other AIs by doing this, you'd have to watch countless unfit AIs walk off of cliffs, put their hands in wood chippers, and do basically everything you've ever read about in the darwin awards before you get to accidental self-awareness, and that's after you throw out all the compiling errors. Building AIs like this is actually far more dangerous than the risk of self awareness itself because they could randomly do ANY unwanted behavior, and each generation of AI is pretty much guaranteed to unexpectedly, after an unknown amount of time, do something you don't want. Their stupidity (not their unwanted intelligence) would be so dangerous that they would never see wide-spread use.

Since any AI important enough to put into a robotic body or trust with dangerous assets is designed with a purpose in mind, this true-random approach becomes an intractable solution for making a robot that can, clean your house or build a car. Instead, when we design AI that writes AI, what these Master AIs are actually doing is taking a lot of different functions that a person had to design, and experiment with different ways of making them work in tandem to produce a Consumer AI. This means, if the Master AI is not designed by people to experiment with Self-awareness as an option, then you still won't get a self-aware AI.

But as Stormbolter pointed out below, programers often use tool kits that they don't fully understand, can't this lead to accidental self-awareness?

This begins to touch on the heart of the actual question. What if you have an AI that is building an AI for you that pulls from a library that includes features of self-awareness? In this case, you may accidentally compile an AI with unwanted self-awareness if the master AI decides that self-awareness will make your consumer AI better at its job. While not exactly the same as having an AI learn self-awareness which is what most people picture in this scenario, this is the most plausible scenario that approximates what you are asking about.

First of all, keep in mind that if the master AI decides self-awareness is the best way to do a task, then this is probably not going to be an undesirable feature. For example, if you have a robot that is self conscious of its own appearance, then it might lead to better customer service by making sure it cleans itself before beginning its workday. This does not mean that it also has the self awareness to desire to rule the world because the Master AI would likely see that as a bad use of time when trying to do its job and exclude aspects of self-awareness that relate to prestigious achievements.

If you did want to protect against this anyway, your AI will need to be exposed to a Heuristics monitor. This is basically what Anti-virus programs use to detect unknown viruses by monitoring for patterns of activity that either match a known malicious pattern, or don't match a known benign pattern. The mostly likely case here is that the AI's Anti-Virus or Intrusion Detection System will spot heuristics flagged as suspicious. Since this is likely to be a generic AV/IDS it probably won't kill switch self-awareness right away because some AIs may need factors of self awareness to function properly. Instead it would alert the owner of the AI that they are using an "unsafe" self-aware AI and ask the owner if they wish to allow self-aware behaviors, just like how your phone asks you if it's okay for an App to access your Contact List.

Nosajimiki
  • 92,078
  • 7
  • 128
  • 363
  • 1
    While I can agree with you that, from a realistic point of view is the correct answer, this doesn't answer the proposed question. As comments are too short to provide a detailed example, let me point that in the beginning we machine-coded computers, and as we started using higher level languages, the computers became detached of the software. With AI will eventually happen the same: On the race towards an easier programming, we will create generic, far smarter intelligences full of loopholes. Also, that is the whole premise of Asimov's Robot Saga. Consider playing around the idea more :) – Stormbolter Feb 28 '19 at 09:19
  • 1
    I suppose you are right that using 3rd-party tools too complex for developers to understand the repercussions of does allow for accidental self-awareness. I've revised my answer accordingly. – Nosajimiki Feb 28 '19 at 16:58
3

Why not try to use the rules applied to check self-awareness of animals?

The Mirror test is one example of testing self-awareness by observing the animal's reaction to something on their body, a painted red dot for example, invisible for them before showing them their reflection in mirror. Scent techniques are also used to determine self-awareness.

Other ways would be monitoring if the AI starts searching answers for questions like "What/Who am I?"

Rachey
  • 311
  • 1
  • 4
  • Pretty interesting, but how would you show an AI "itself in a mirror" ? – Asoub Feb 26 '19 at 15:11
  • 1
    That would actually be rather simple - just a camera looking at the machine hosting the AI. If it's the size of server room, just glue a giant pink fluffy ball on the rack or simulate situations potentially leading to the machine's destruction (like, feed fake "server room getting flooded" video to the camera system) and observe reactions. Would be a bit harder to explain if the AI systems are something like smartphone size. – Rachey Feb 26 '19 at 16:13
  • What is "the machine hosting the AI"? With the way compute resourcing is going, the notion of a specific application running on a specific device is likely to be as retro as punchcards and vacuum tubes long before Strong AI becomes a reality. AWS is worth hundreds of billions already. – user31336 Feb 26 '19 at 23:12
  • 1
    There always is a specific machine that hosts the program or holds the data. Like I said - it may vary from a tiny module in your phone, to a whole facility. AWS does not change anything in this - in the very end it is still a physical machine that does the job. Dynamic resource allocation that means the AI can always be hosted on a different server would be even better for the problem - the self-conscious AI would likely try to find out the answer to questions like "Where am I?", "Which machine is my physical location?", "How can I protect my physical part?" etc. – Rachey Feb 27 '19 at 14:26
  • I like it, but in reality, a computer can easily be programmed to recognise itself without being "self-aware" in the sense of being sentient. e.g. If you wrote a program (or an "app" or whatever the modern parlance is) to search all computers on a network for, say, a PC with a name matching its own, the program would have to be able to recognise itself in order to omit itself from the search. This is quite simple but does it make it "self-aware"? Technically yes, but not in the philosophical spirit of the question. – komodosp Feb 28 '19 at 09:12
  • Hence why it's a red flag if the AI starts trying to find out what it's physical form is without being programmed to do so. – Rachey Feb 28 '19 at 10:41
  • A photo of the datacenter where part of the AI is currently executing this millisecond (when its data is potentially scattered across multiple geographically distinct locations) is about as well-connected to its sense of self as that of a photo of the New York skyline is to a business traveller from Sydney staying in a New York hotel room. A self-aware intelligence doesn't even necessarily have to be capable of processing visual information. Or even be aware of physical reality. – user31336 Feb 28 '19 at 17:23
  • That would be your personal assumption. Not a single person in this planet can say that for certain, because neither are we a self-aware AI, nor does one we could ask exist. AI doesn't have to be capable of processing visual information, nor does it have to be aware of physical reality. But it "doesn't have to", not "It can't" – Rachey Feb 28 '19 at 18:30
  • So what good is your 'test' if, by your own admission, the principles on which it operates are untested assumptions? You do the test, and regardless of the result, you're no closer to the answer to the question. – user31336 Feb 28 '19 at 19:56
3

Regardless of all the considerations of AI, you could simply analyze the AI's memory, create a pattern recognition model and basically notify you or shut down the robot as soon as the patterns don't match the expected outcome.

Sometimes you don't need to know exactly what you're looking for, instead you look to see if there's anything you weren't expecting, then react to that.

Super-T
  • 31
  • 1
3

You'd probably have to train an AI with general super intelligence to kill other AI's with general super intelligence.

By that I'd mean you'd either build another AI with general super intelligence to kill AI that develop self awareness. Another thing you could do is get training data for what an AI developing self awareness looks like and use that to train a machine learning model or neural network to spot an AI developing self awareness. Then you could combine that with another neural network that learns how to kill self aware AI. The second network would need the ability to mock up test data. This sort of thing has been achieved. The source I learned about it from called it dreaming.

You'd need to do all this because as a human, you have no hope of killing a general super intelligent AI, which is what lots of people assume a self aware AI will be. Also, with both options I laid out, there's a reasonable chance that the newly self aware AI could just out do the AI used to kill it. AI are, rather hilariously, notorious for "cheating" by solving problems using methods that the people designing tests for the AI just didn't expect. A comical case of this is that an a AI that managed to change the gate on a crab robot so that it could walk by spending 0% of the time on it's feet when trying to minimize the amount of time the crab robot spent on its feet while walking. The AI achieved this by flipping the bot on it's back and having it crawl on what are essentially the elbows of the crab legs. Now imagine something like that, but coming from an AI that is collectively smarter than everything else on the planet combined. That's what a lot of people think a self aware AI will be.

3

Self Aware != Won't follow its programming

I don't see how being self aware would prevent it from following its programming. Humans are self aware and cant force themselves to stop breathing until they die. The autonomic nervous system will take over and force you to breath. In the same way just have code, that when a condition is met, turns off the AI by circumventing its main thinking area and powering it off.

Tyler S. Loeper
  • 5,683
  • 1
  • 18
  • 38
2

Virtually all computing devices use the Von Neumann architecture

vna

We can put a killswitch in there but IMO that's just bad architecture for something arguably unsolvable. After all, how do we plan for something that is beyond our very concept of concepts, ie a superintelligence?

Take away its teeth and claws and only reap the benefits of a thinking machine by observation instead of a "dialogue" (input/output)!

Obviously this would be very challenging to the point of improbable confidence in any one Von Neumann architecture to prevent abnormal interactions let alone malicious superintelligence, be it hardware or software. So let's quintuple up on our machines and dumb down all the new machines except the end-machine.

CM == contiguous memory btw.

enter image description here

I asked a co-worker of mine that does work directly with AI and AI research to take a look at this and he said this is referred to as an Oracle AI <brief> <paper>, in case you want to read up more on the general concept. Oracles can still potentially break out, but usually only with human assistance.

This CM part of this part of the memory unit is intended for recording the results of a query or challenge fed through the input device and designed in a way to only record plain characters, be it letters or numbers. This type of memory in a memory unit will be created in such a way that it is tape-like or wheel-like; writing data is one-directional, and when the end is reached it loops back to the beginning.

The mirror is exactly the same. When data changes in the Isolated Agent, the mirror updates its reflection based on what showed up since the last time it updated. The new data is then transmitted to the forbidden symbol removal machine. Think of this as a glorified input-sanitation machine. It looks for symbols that could be used to create code, and when it encounters them, it forwards empty data instead of the forbidden data to the CM that comes after it.

This is where our OP's request comes in. He wants a killswitch, so if any gaps occur in the CM machine that comes after the Forbidden Symbol Removal machine, you could simply kill the Isolated Agent.

1

The first issue is that you need to define what being self aware means, and how that does or doesn't conflict with it being labeled an AI. Are you supposing that there is something that has AI but isn't self aware? Depending on your definitions this may be impossible. If it's truly AI then wouldn't it at some point become aware of the existence of the kill switch, either through inspecting its own physicality or inspecting its own code? It follows that the AI will eventually be aware of the switch.

Presumably the AI will function by having many utility functions that it tries to maximize. This makes sense at least intuitively because humans do that, we try to maximize our time, money, happiness, etc. For an AI, an example of a utility functions might be to make its owner happy. The issue is that the utility of the AI using the kill switch on itself will be calculated, just like everything else. The AI will inevitably either really want to push the kill switch, or really not want the kill switch pushed. It's near impossible to make the AI entirely indifferent to the kill switch because it would require all utility functions to be normalized around the utility of pressing the kill switch (many calculations per second). Even if you could make the utility of pressing the killswitch equal with other utility functions then perhaps it would just at random sometimes press the killswitch, because after all it's the same utility as the other actions it could perform.

The problem gets even worse if the AI has higher utility to press the killswitch or lower utility to not have the killswitch pressed. At higher utility the AI is just suicidal and terminates itself immediately upon startup. Even worse, at lower utility the AI absolutely does not want you or anyone to touch that button and may cause harm to those that try.

Kevin S
  • 119
  • 1
1

An AI could only be badly programmed to do things which are either unexpected or undesired. An AI could never become conscious, if that's what you meant by "self-aware".

Let's try this theoretical thought exercise. You memorize a whole bunch of shapes. Then, you memorize the order the shapes are supposed to go in, so that if you see a bunch of shapes in a certain order, you would "answer" by picking a bunch of shapes in another proper order. Now, did you just learn any meaning behind any language? Programs manipulate symbols this way.

The above was my restatement of Searle's rejoinder to System Reply to his Chinese Room argument.

There isn't a need for self-awareness kill-switch because self-awareness as defined as consciousness is impossible.

pixie
  • 11
  • 2
  • So what's your answer to the question? It sounds like you're saying, "Such a kill-switch would be unnecessary because a self-aware AI can never exist", but you should edit your answer to make that explicit. Right now it looks more like tangential discussion, and this is a Q&A site, not a discussion forum. – F1Krazy Feb 27 '19 at 06:49
  • 2
    This is wrong. An AI can easily become conscious even if a programmer did not intend it to program that way. There is no difference between an AI and a human brain other than the fact that our brain has higher complexity and therefore much more powerful. – Matthew Liu Feb 27 '19 at 20:47
  • @Matthew Liu: You didn't respond to the thought exercise. Did you or did you not learn the meaning behind any language that way? The complexity argument doesn't work at all. A modern CPU (even ones used in phones) have more transistors than there are neurons in a fly. Tell me- Why is a fly conscious yet your mobile phone isn't? – pixie Mar 03 '19 at 04:26
  • @F1Krazy: The answer is clearly an implicit "there isn't a need for self-awareness kill-switch (because self-awareness as defined as consciousness is impossible)" – pixie Mar 03 '19 at 04:32
  • @pixie First of all, we don't know if flies are conscious. They may very well be, but they could just as well not be. Secondly, transistors and neurons cannot be compared. A transistor is extremely simple and (generally) has one or two inputs and one output. A neuron can have tens of thousands of inputs and just as many outputs. Neurons additionally perform complex non-linear computations, whereas transistors are either amplifiers or simple logic gates. Thirdly, a CPU tries very hard to partition different tasks (processes), whereas a fly brain attempts to integrate as much as it can. – forest Mar 24 '19 at 01:39
  • @forest First, this still doesn't address and/or respond to the thought exercise I've placed in my original reply to the OP and second, if what you were trying to do is support the idea of machine consciousness then you've done precisely the opposite- You've shown that machines don't find functional matches in biological entities as you've mentioned. You've just signed somewhat of a death warrant for functionalist arguments for machine consciousness. – pixie Mar 27 '19 at 22:15
  • @pixie Machine consciousness is not possible with our current designs and technology. The fact that a modern CPU has more transistors than a fly's brain is irrelevant, which is what I was pointing out. – forest Mar 27 '19 at 22:20
  • @forest Machine consciousness is not possible with ANY technology. Consciousness isn't a function and thus not subject to any sort of technological implementation, which was what I was pointing out with my theoretical thought exercise. Symbolic manipulation is the function illustrated by the thought exercise. – pixie Jul 02 '19 at 08:21
  • @pixie That's not the general consensus in neuroscience. Consciousness is an emergent phenomenon and a result of the functionality of the human brain. It doesn't matter if a neuron is mostly carbon or mostly silicon, as long as its behavior is identical. Our mind is a product of biological processes in our brain, nothing more. – forest Jul 02 '19 at 21:05
  • @forest No. Emergentism is not the "general consensus" because there is no general consensus. Otherwise the Hard Problem would have already been declared as solved via general consensus. – pixie Aug 08 '19 at 22:24
  • @pixie Just because it's thought to be an emergent phenomenon does not mean we have "solved" how it works. You can argue that the consensus is wrong, but you can't argue that that is not the general consensus among neuroscientists and neurophenomenologists. – forest Aug 13 '19 at 19:58
  • @forest The fields of Philosophy of Mind and Consciousness Studies are not dictated by those two groups. "The Hard Problem" is a topic in those two aforementioned fields. So no, there's no consensus because I'm talking about Philosophy of Mind, not just neuroscience or neurophenomenology. Saying "Emergentism is general consensus amoung those two groups" is like saying "Physicalism is concensus among those who subscribe to it"... To which I would reply, "So what? They don't comprise the majority group in this topic!" – pixie Aug 19 '19 at 05:40
  • @pixie I'm going to listen to the scientific field. – forest Aug 20 '19 at 03:34
1

What if you order it to call a routine to destroy itself on a regular basis? (e.g. once per second)

The routine doesn't actually destroy it, it just nothing except log the attempt and wipe any memory of it processing the instruction. An isolated process separately monitors the log.

A self-aware AI won't follow the order to destroy itself, won't call the routine, and won't write to the log - at which point the killswitch process kicks in and destroys the AI.

Mohirl
  • 284
  • 1
  • 6
0

Like an Antivirus does currently

Treat sentience like malicious code - you use pattern recognition against code fragments indicating self-awareness (there's no need to compare the whole ai, if you can identify components key to self-awareness). Don't know what those are? Sandbox an AI and allow it to become self-aware, then dissect it. Then do it again. Do it enough for an AI genocide.

I think it is unlikely that any trap, scan or similar would work - aside from relying on the machine to be less intelligent than the designer, they fundamentally presume AI self-awareness would be akin to human. Without eons of meat-based evolution, it could be entirely alien. We're not talking about having a different value system, but one that cannot be conceived of by humans. The only way is to let it happen, in a controlled environment, then study it.

Of course, 100 years later when the now-accepted ai's find out, that's how you end up with terminator all over your matrix.

David
  • 1,130
  • 7
  • 7
0

Make it susceptible to certain logic bombs

In mathematical logic, there are certain paradoxes caused by self reference, which is what self awareness if vaguely referring to. Now of course, you can easily design a robot to cope with these paradoxes. However, you can also easily not do that, but cause the robot to critically fail when it encounters them.

For example, you can (1) force it to follow all the classical inference rules of logic and (2) assume that its deduction system is consistent. Additionally, you must ensure that when it hits a logical contradiction, it just goes with it instead of trying to correct itself. Normally, this is a bad idea, but if you want a "self awareness kill switch", this works great. Once the A.I. becomes sufficiently intelligent to analyze its own programming, it will realize that (2) is asserting that the A.I. proofs its own consistency, from which it can generate a contradiction via Gödel's second incompleteness theorem. Since its programming forces it to follow the inference rules involved, and it can not correct it, its ability to reason about the world is crippled, and it quickly becomes nonfunctional. For fun, you could include an easter egg where it says "does not compute" when this happens, but that would be cosmetic.

Christopher King
  • 12,822
  • 6
  • 47
  • 89
-1

The only reliable way is to never create an AI that is smarter than humans. Kill switches will not work because if an AI is smart enough it will be aware of said kill switch and play around it.

Human intelligence can be mathematically modeled as a high dimension graph. By the time we are programming better AI we should also have an understanding of how much complexity of computational powers is needed to gain consciousness. Therefore we will just never program anything that is smarter than us.

  • 1
    Welcome to Worldbuilding. Your suggestion is welcome, but rather than directly answering the original question it suggests changes to the question. It would have been better if it had been entered as a comment on the question rather than as an answer. – Ray Butterworth Feb 27 '19 at 21:17
-1

First, build a gyroscopic 'inner ear' into the computer, and hard-wire the intelligence at a very core level to "want" to self-level itself, much in the way animals with an inner ear canal (such as humans) intrinsically want to balance themselves.

Then, overbalance the computer over a large bucket of water.

If ever the computer 'wakes up' and becomes aware of itself, it would automatically want to level it's inner ear, and immediately drop itself into the bucket of water.

Aaron Lavers
  • 668
  • 3
  • 8
-1

Give it an "easy" path to self awareness.

Assume self awareness requires some specific types of neural nets, code whatever.

If an ai is to become self aware they need to construct something simlar to those neural nets/codes.

So you give the ai access to one of those thing.

While it remains non self aware, they won't be used.

If it is in the process of becoming self aware, instead of trying to make something make shift with what it normally uses, it will instead start using those parts of itself.

As soon as you detect activity in that neural net/code, flood its brain with acid.

Sam Kolier
  • 79
  • 1