AI experts are becoming concerned that advanced AI will take over from humanity
What probability do you put on future AI advances causing human extinction or similarly permanent and severe disempowerment of the human species?
In a recent survey, that exact question was asked to researchers who have published at top AI conferences. The majority of respondents estimated at least a 1 in 20 chance of such an AI takeover.
Many researchers who published at NeurIPS and ICML (two of the top machine learning conferences) are worried about existential risks from AI.
Major AI firms are also becoming worried. Both OpenAI (the creator of GPT-3, ChatGPT, and DALL-E) and DeepMind (the creator of the first AI system to beat the best humans at Go) have repeatedly expressed this worry, going as far as hiring researchers to investigate ways to preempt AI takeover. Even Satya Nadella, the CEO of Microsoft, has stated that the way to make sure runaway AI never causes “lights out for humanity” is to “make sure it never runs away.”
While the question “Will AI take over?” might sound like ludicrous science fiction, there are strong reasons for taking it seriously. For instance, the current deep learning AI paradigm produces AI systems with certain alarming characteristics that, if present in more powerful systems, may lead directly to AI takeover.
Current AI systems are often opaque and can develop surprising emergent capabilities
In previous paradigms of AI, it was common for systems to be programmed to follow a specific set of rules that yielded intelligent-seeming behavior. For instance, Deep Blue, the AI system that bested a reigning world champion at chess in 1997, was programmed in this manner.
Under the current AI paradigm of deep learning, however, AI systems are produced in a different manner. Deep learning systems “learn” intelligent-seeming behavior in a trial-and-error-like process known as “training.” During training, these systems are typically given instructions for what constitutes desirable outcomes (such as winning a game or correctly classifying an image) as well as instructions for how to adjust their internal processes in order to display more behavior that leads to these desirable outcomes.
Modern machine learning differs from deep learning techniques in that designers don't write explicit rules for the machine to follow.
As our machine learning models get larger and more advanced, new and powerful capabilities emerge.
If powerful AI develops these capabilities, it may be hard to stop
It’s presumably obvious enough why we don’t want AI systems with undesirable goals or deceptive abilities. Unfortunately, we already have many different examples of AI systems that have internalized at least proto-goal-directed behavior that was unwanted by their designers. Further, language models like GPT-3 can be prompted to formulate long-term plans towards goals or to act deceptively.
Situational awareness, on the other hand, is not dangerous in itself, and might even be desirable – hence why designers appear to be actively steering AI systems towards it.
ChatGPT, an chatbot implementation of OpenAI's models, explains what it is.
But combined with properties like goal-directedness, situational awareness could cause major problems. In particular, a goal-directed system with situational awareness may determine that it would fail to achieve its goals if it was shut off, and it may therefore make plans to prevent itself from being shut off (similar to HAL from 2001: A Space Odyssey).
If such a system were additionally deceptive, it may, in furtherance of its goals, hide the extent to which its goals differed from that of its designers – effectively “playing nice” to deceive its overseers into thinking it was better behaved than it was, and only directly pursuing its goals once it was in a secure position to do so.
Once humans realized that the AI system was actually acting against our interests, the AI might have been able to patch its vulnerabilities (such as by secretly distributing copies of itself across the internet for redundancy), and humanity may therefore not be able to shut it off.
If competent enough AI systems attempted such a strategy, they might be able to take over from humans, completely disempowering us. If these systems pursued goals at odds with human survival (such as by converting all arable land to data centers for more computation), then humanity could go extinct.
These sorts of capabilities may be far away, but even some current AI systems are already displaying “baby versions” of similar behavior. For instance, researchers at the AI lab Anthropic found that some cutting-edge AI systems have a tendency to express a preference to not be shut down, specifically because being shut down would prevent them from achieving their goals; larger systems also tended to express this preference more frequently.
In modern day models that go through RLHF ("Reinforcement Learning by Human Feedback" ), models will express that they do not want to be shut down, even when humans request it.
Elsewhere, the AI chatbot Bing Chat has displayed some level of awareness of being in an adversarial relationship with various humans, including repeatedly referring to users who attempted to hack it as its “enemy” or as a “threat.”
Bing Chat, Microsoft and Open AI's chatbot, has been shown to threaten other humans and users given the right prompts.
And more recently, when the nonprofit Alignment Research Center tested GPT-4 for risky emergent behaviors, they found that GPT-4 was capable of deceiving humans in furtherance of a goal. Specifically, GPT-4 tricked a worker from TaskRabbit to solve a CAPTCHA for it, by lying about being a human itself with a vision impairment.
Current state of the art models can already effectively deceive humans in pursuit of their goals.
Where do we stand now?
No serious researcher expects GPT-4 to take over from humans. But also, no serious researcher would pretend to know how far away we are from AI systems that could take over.
What we do know is there are several reasons to expect AI systems to continue to become larger and more powerful:
Fast recent progress – from conversing with humans to out-strategizing humans in complex games, recent progress in deep learning has been incredibly fast. For years, a steady stream of researchers predicted deep learning would soon hit a wall, but that stream has slowed to a trickle as deep learning has continued to achieve what was previously thought to be years or decades away. Even those who forecasted fast progress have been shocked by how much faster progress turned out to be.
Increased investment in AI – investors are pouring into AI. VC investment in “generative AI” like GPT has jumped from under half a billion dollars in 2020 to over $2B in 2022. And already in 2023, Microsoft is investing $10B in OpenAI, and Google is investing $300M in Anthropic. In recent years, the largest AI systems have grown rapidly – doubling in their computational use every 6 months or so. Increased investments could sustain AI’s rapid growth for years to come, with comparable increases in performance.
More profits – until very recently, AI wasn’t being commercialized much and was instead largely being used in demos. That’s beginning to change fast. Commercialization of AI means potential profits, which could be reinvested and create an incentive for further investment.
Money keeps pouring in to generative AI companies despite the dangers.
We should also expect that as systems scale up and become more powerful, new qualitative capabilities will emerge, possibly including capabilities that are really dangerous. Some of these capabilities may appear suddenly, in line with many previous emergent capabilities. Further, we might not even know once these capabilities have emerged, as previous emergent capabilities have occasionally remained latent for months before clever researchers discovered ways to uncover them.
While this all might lead to AI takeover, some researchers are investigating ways to preempt such a situation. For instance, research into mechanistic interpretability seeks to understand the hidden inner workings of deep learning systems; this work may allow us to avoid being caught off guard by emergent capabilities, and may even enable us to steer AI systems during training to prevent them from ever developing dangerous capabilities like deception. (See this online curriculum for further research avenues towards avoiding AI takeover.)
But the number of researchers pursuing these research avenues is small – by one estimate, a few hundred worldwide, compared to at least tens of thousands of researchers working on advancing AI capabilities. Both groups include very smart people. And on the whole, both groups think the risk of AI taking over is higher than what we’d hope – the group working on AI capabilities includes the survey participants who place a median chance of 1 in 20 that we will see AI take over, with the result being either literal human extinction or some other form of human disempowerment that’s totally permanent and equally severe.
If you would like to learn more about the worries of AI takeover and what can be done to try to prevent it, you can follow any of these links:
This website was a collaboration between Daniel Eth and the Centre for Effective Altruism