In 1943, at the height of World War II, the U.S. military hired an audacious psychologist named B.F. Skinner to develop pigeon-guided missiles. These were the early days of munitions guidance technology, and the Allies were apparently quite desperate to find more reliable ways to get missiles to hit their targets.
It went like this: Skinner trained pigeons to peck at an image of the military target projected onto a screen. Whenever their beaks hit the moving target dead center, he rewarded the birds with food pellets. Once the pigeons had learned how to peck at targets, they earned their wings: Skinner would strap three of his little pilots into a missile cockpit specially fitted with straps attached to gyroscopes that would steer the bomb.
Now, when American jets released their pigeon-filled bombs, the birds would peck at an image of the bomb’s target, their little straps twisting and bending, gyroscopes whirling, guiding the bomb and the birds to their final resting place.
Image source: Anton van Dalen, B.F. Skinner with 1943 demonstration model “Project Pigeon”, 1986. Medium: Oil on canvas. Size: 48 x 64 inches. Used with permission of the artist.
The military eventually pulled the plug on Project Pigeon, while Skinner continued to develop a discipline that came to be known as behavioral psychology. “Behavioral” because, unlike his Freudian predecessors, Skinner didn’t care about unobservable characteristics of conscious intelligence — things like thoughts, emotions, desires, and fears. He just wanted to discover how to train animals (and his children) using scientific techniques of stimulus, reward, and punishment.
If there’s a modern Project Pigeon, it’s DeepMind’s AlphaGo. Over the past three years, using techniques similar to those pioneered by Skinner, DeepMind has developed some of the most sophisticated machine-learning techniques in order to train a computer with artificial intelligence (AI) to master the ancient board game of Go.
Weirdly enough, this millenia-old board game is the perfect demonstration of human complexity, machine limitations, and how powerful AI has become.
For decades, researchers considered playing Go to be the holy grail of game-playing AI. No computer had ever come close to beating a professional in an even, full-board game. Many thought it impossible.
Intriguingly, AlphaGo plays Go with something akin to human-like intuition. That’s new. Computers have always been good at doing the kinds of tasks that we can logically define, like multiplying large numbers, storing information, and playing recorded movies. But they struggle with implicit knowledge. Those are the things we know how to do but cannot explain — even to ourselves — how we do them. Recognizing faces, learning a language, identifying diseases, and exercising common sense are all activities we might like machines to perform, but which can’t be codified in a set of rules. Broadening AI’s capabilities to include implicit knowledge opens up a vast number of new tasks to computers.
But DeepMind’s biggest achievement lies in the fact that AlphaGo mastered the world’s thorniest game without anyone ever teaching it how to play Go. That’s because fundamentally DeepMind didn’t program a Go-playing machine; they built a learning machine that taught itself to play Go.
And a computer that excels at learning may be able to learn other things, too. Even as AlphaGo was busy practicing Go, it was learning to manage DeepMind’s power consumption, saving 40% of the energy used by the latter’s data center cooling systems. This isn’t just a big deal for DeepMind’s parent company, Alphabet (NASDAQ: GOOGL) (NASDAQ: GOOG). With data centers consuming 3% of the world’s energy usage, it doesn’t take a lot of imagination to see that just the efficiency gains from machine learning could reshape worldwide patterns of manufacturing and consumption.
AlphaGo is but two years old. We have scant experience with state-of-art machine learning. Its abilities, flaws, and quirks are unknown to us. And due to the makeup of its architecture, we can’t interrogate AlphaGo about its thinking processes any more than we could crack open a human brain to see its thoughts.
But we do have 221 publicly available game records. And they reveal the alter-human thinking that’s poised to revolutionize information technology, transportation, business, and more.
I studied dozens of games spanning every stage of AlphaGo’s development in addition to DeepMind’s published scientific papers. They can help us to visually grasp AlphaGo’s personality and how other artificially intelligent machines might think and behave in the future.
If you want to understand what AI is capable of, AlphaGo is the place to start.
Go, in 10 seconds
The rules of Go are simple.
Players take turns placing a white or black stone on a board.
Image source: Getty Images.
If a group of stones is surrounded by an opponent’s stones, it is captured and removed.
Capture stones by surrounding them. Images and animations created by author using Goban with permission of Sente.
The player who surrounds the most territory, like white does in the game below, wins:
Scoring, after a game. Black has territory on the left, right, and top. White has territory on the bottom and upper-left corner. Image by author.
That’s basically it.
To understand why such a simple game has confounded AI research for decades, it’s helpful to contrast Go with chess.
For decades, it was said that computers would be intelligent if only they could play chess. Then, half a century after Alan Turing published the first chess algorithm, IBM‘s (NYSE: IBM) Deep Blue supercomputer beat world champion Gary Kasparov.
Deep Blue’s success was impressive, but IBM’s algorithms were fundamentally similar to those of Turing’s 1950 program. What finally made Deep Blue possible was a 3-million-fold increase in computing power since Turing’s day.
One little-noticed fact: The same year that Deep Blue beat the world’s greatest chess player, state-of-the-art Go AI had only achieved the skill level of a decent beginner.
Why Go is the perfect testing ground
I once asked Kasparov if he’d ever played Go.
Would you like to?
“It’s too difficult.”
“It’s a completely different way of thinking.”
It’s that “different way of thinking” that has for so long eluded machines.
Deep Blue beat Kasparov with brute force: It memorized lots of games, it applied tactical and strategic rules of thumb, and it used superior processing power to read future moves possibilities more deeply than could Kasparov (though just barely). But computers cannot master Go through sheer processing power. And they can’t for many of the same reasons they’ve never been able to perform other, more crucial, tasks.
Possibilities in chess are limited by an 8 x 8 board and rules that define where you can move specific pieces. By contrast, a full-sized Go board measures 19 x 19, and stones can be played just about anywhere.
The upshot is that a chess player is faced with an average of 35 choices for each move. Go averages 250 options. This figure is known as a problem’s “branching factor,” and it is a bane of AI. If you multiply 250 by itself enough times — to evaluate possible responses, counter-responses, and so forth — you quickly arrive at a number of positions much greater than the number of atoms in the universe, which would take all of the world’s computers well over a million years to map out.
In fact, it wasn’t even until last year that anyone managed to actually calculate how many. (It’s about 2.081681994 x 10^170. Or, in plain English, two hundred quinquinquagintillion.)
The number of possible Go board states equals approximately 2.081681994 x 10^170. Image by author. Calculations by John Tromp.
Since no one — human or supercomputer — could ever examine every possibility, players rely on feeling and intuition. To an experienced Go player, a good move feels right and looks beautiful. Go’s humanness is what makes the game impenetrable for machines.
Second, it’s tricky to quantify the value of a move. Even figuring out who’s winning can be a challenge.
Chess pieces have clear values: a pawn is worth 1 point; a knight, 3; a rook, 5; and so on. You can get a rough sense of who’s winning by comparing the values of captured pieces. Deep Blue employed dozens of such rules of thumb to locate good moves. But Go stones have no inherent, fixed value; they matter only in their relation to one another. Uncertain move values further complicate decision making.
Third, most Go situations involve some kind of bargain. The trick is to find out what your opponent wants, and to force them to give you something you want in return. Trying to take everything ends badly. So robotic inflexibility is out.
Finally, the things you trade have abstract values that are not always quantifiable. Aside from points, a player might want influence (stones that could be useful in a later fight), sente (the freedom to choose where to play next), or aji (literally “aftertaste,” an untranslatable Japanese term for latent shenanigans: “Aji,” one teaching resource explains, “is like a stone in your shoe when you are late. The stone hurts — and as a result, you can’t run as fast. But because you are late, you cannot stop to take it out”).
Here’s a simple example. Black could seal off valuable corner points with another couple of moves:
A sequence where black takes corner territory. Image and animation by author.
Or, black could trade away the corner territory to white. In return, they get stones facing the side and center, which might become useful in the future (influence). The marked white stone is severely weakened, but it could become a complicating nuisance for black later on (aji). It’s black’s turn, and they can play wherever they like (sente).
A sequence where black trades corner potential to gain influence. Image and animation by author.
You can see the impossibility of placing precise values on influence, aji, and sente. Much like real-world conflicting values — customer service, stock performance, and risk mitigation for a business; do no harm, full recovery, and triage for a doctor; troop support, deadliness, and limited collateral damage for a drone — it’s apples and oranges all the way down. Machines have trouble grasping the nuances of such trade-offs because the entities are abstract, nuanced, and conceptually distinct. Nor do common synonyms exist in zeros and ones.
Despite these challenges, a steady evolution in AI technology has allowed AlphaGo to master the game. These changes embody how AI evolved to its current state and presage its future. To understand where AI is headed, we need to see how we got to where we are today.
Stage 1: Do as I say (1950s to 1960s)
The official birth of AI occurred at a 1956 conference in Dartmouth College, formally held on the grandiose premise that “every aspect of … intelligence can in principle be so precisely described that a machine can be made to simulate it.”
AI’s founders took their cues from philosopher-mathematicians like Gottlob Frege and Bertrand Russell, who had recently developed a special notation for logic. According to the leaders of this approach, the key to intelligence lay in applying the rules of logic. They made rapid progress building programs that could derive important mathematical proofs and confidently predicted that in just 10 years, a computer would master chess.
But then progress hit a wall. There was an obstacle no one foresaw.
Stage 2: Rules and rules of thumb (Late 1960s to early 2000s)
As researchers began to ask computers to solve complex, real-world problems like diagnosing diseases or translating Russian, it turned out that many of these complex problems could only be solved in theory but not in practice. The time and memory requirements for solving problems tend to grow exponentially with how deep you have to look for an answer. The halo evaporated. Researchers abandoned the phrase “artificial intelligence” to escape painful snickering from academic and research communities. There were funding cuts galore.
In a way, what saved AI from irrelevance was a division of labor. Instead of trying to program machines that could do anything purely with logic, researchers lowered their expectations and began to tailor individual programs to specific problems. Limiting the kinds of problems a program needs to solve helps to limit the number of possible solutions it must search.
Increasingly, programmers began to model AI after the way humans think. That often meant using heuristics, or mental shortcuts.
We use heuristics all the time: The pinker the chicken, the longer you should keep it in the oven. If the tomato is firm, it’s fresh enough to eat. Need more flavor? Add more seasoning.
These little bits of knowledge are crucial. We couldn’t live without them — there just isn’t enough time in the day to do everything perfectly.
The first attempts to build Go-playing computers worked the same way. One section of code estimated game score, another influence. There were routines to recognize sente, identify how to protect important stones from being captured, access a library of common sequences, and so on for all the specialized skills advanced players use.
Armed with these abilities, the computer would consider several moves. For each move, it would consider a number of possible responses, counter-responses, and so forth, until it produced a model of possible outcomes that resembled a tree. The goal was to search the game tree for the “least bad” outcome by following a path that leaves your opponent no good choices. This is how Deep Blue worked, too.
Representation of tree data structure. Image by author.
Of course, just as the food a chef produces depends on the quality of their recipes, a heuristic-based AI is only as good as the heuristics humans can cook up. Moreover, the approach just isn’t up to the task when the number of possibilities is truly vast.
And so, after decades of slow progress, heuristic-based AI only achieved the strength of an intermediate-level Go amateur. They were rigid and predictable opponents. Memorization and rule following don’t lend well to intuition, flexibility, and creativity. Further advances would require a revolution.
Stage 3: Statistical (and weird!) play from rational aliens (mid-2000s to present)
2006 saw a breakthrough with the success of a technique called Monte Carlo Tree Search (MCTS).
The approach is an old one, and today it is used to bolster logistics and production management. MCTS has been applied to vehicle routing, airline scheduling, packaging, robot motion, and finance. It’s made its way into popular strategy games like Total War: Rome II, Scrabble, poker, and chess.
MCTS’ name may sound formidable, but the idea is simple. It replaces human-like heuristics with a simple statistical technique known as Monte Carlo simulation. When your financial advisor tells you the odds that your portfolio will last you through retirement, they’re using Monte Carlo simulation.
Just as before, you begin with a tree search. But instead of relying on hard-coded Go heuristics to estimate the best outcome, the computer just simulates a bunch of random games to see who’s more likely to win.
Generating random numbers is something computers can do really quickly, and it’s surprisingly effective. MCTS cuts out middleman concepts like influence and aji, offering the machine a direct route to what you ultimately care about: winning.
This is the philosophy that dominates AI today: Algorithms are no longer purely logical. Nor do they imitate the way humans actually think. They just act rationally in pursuit of a goal. (Articulated by computer scientists Stuart Russell and Peter Norvig, the think vs. act, humanly vs. rationally distinctions form a helpful framework for the different approaches to AI.)
But MCTS is fundamentally at odds with human approaches to problem solving. (How often do you sit down at the local Olive Garden and visualize 10,000 randomized menu simulations to choose between chicken and never-ending pasta?)
And so MCTS-based AI exhibits odd quirks. To truly appreciate how many autonomous machines of the future will make decisions, you must understand the deep strangeness of MCTS. Playing a game with an MCTS opponent can feel like sitting face-to-face with an intelligent alien who knows the rules of the game, but who has never seen an actual game played.
During the early stages of a Go game, players generally stick to the first four lines of the board. (It’s easier to secure territory in the corners and sides than in the center.) But an AI program using MCTS will often plop a stone somewhere in the middle.
Here’s an example, taken from a real game I played against a fairly strong AI software named Fuego:
An unusual move by MCTS-based AI Fuego. Image by author.
Granted, Fuego’s move is coherent. The stone expands white’s potential on the bottom, limits black’s potential on the right, and could become a lifeline to white’s two stones on top should they come under attack later on:
The unusual move does three things. Image by author.
But it’s bizarre. This “do whatever gets you to the goal” mindset in MCTS-based AI works, but it could lead to trouble down the road because we want our self-driving cars, automated paralegals, and robotic nannies to be not just competent, but predictable and relatable to humans.
Second, an AI program using MCTS can flounder as an outcome comes into view. Winning (and therefore risk-averse) MCTS software often plays seemingly irrational and slightly harmful moves. Losing MCTS programs are more exciting — they’re prone to spectacular self-destruction.
The cause is simple: Humans think we win games by boosting our lead (if we’re winning) or reducing our deficit (if we’re losing). But MCTS software tries to improve its probability of victory. It does not distinguish between a 5-point loss and a 50-point loss. And so, when a situation is hopeless, MCTS can no longer distinguish good options from stupid ones. On the road to defeat, every path looks equally grim.
As the AI software’s odds of success diminish, you begin to see what looks like waves of panic culminating in meltdown. It’s a strange feeling, like watching a stock market flash crash, but for rationality.
If the old tree-search model was too rigid and robotic, the MCTS approach embodies a quirky numbers-crunching savant ungrounded in experience. Sometimes hyper-rationality without common sense is indistinguishable from insanity.
Stage 4: Pattern recognition (2010s to present)
The final big break came when researchers found a way to root MCTS in pattern recognition. The key, surprising at the time, turned out to be an old machine-learning technique inspired by the human brain.
Artificial neural networks are based on ideas that have been around since the 1950s. But they were long considered a backwater of machine learning. To train a neural network requires a lot of data and a lot of computing power — things that weren’t available until very recently. A confluence of important tech trends — the internet, big data, distributed computing, and cloud storage — have now changed that.
Neural networks form the backbone of Facebook‘s facial recognition and News Feed curation technologies, Google Translate, self-driving car vision, and countless other applications. They’re particularly good at processing images and sound.
Neural networks don’t actually model brains — that’s a common misconception — but the analogy is helpful for understanding their functioning.
Like a network of axons, a neural network models a web of connected data nodes known as artificial neurons. Deep neural networks contain many layers of such nodes. When you hear people use the phrase “deep learning,” this is what they mean.
Brain neurons communicate by sending electrical charges to other neurons via pathways of varying connection strength. Artificial neural nodes contain numbers called weights that represent how much influence they exert on each node in the next layer.
3D rendering of brain axons alongside a neural network for identifying types of flowers. Image sources: Getty Images (left) and author (right), using Weka.
As raw data flows through a neural network, each layer of nodes acts like a filter, transforming the information through increasingly high-level features.
To train a neural network, you give it an example, see if it outputs the right thing, correct the error if it doesn’t by changing the weights according to special mathematical functions, and repeat with new examples. Eventually, after practicing on millions of examples, it’ll get better at doing what it’s supposed to do. (Just like a pigeon.)
Neural networks are slow learners. They need lots of examples, and they take a long time to train, because thousands of weights must be adjusted according to mathematical formulas each time it receives one of the myriad examples. But there’s an upside: Neural networks are incredibly sensitive and can capture a lot of subtle information.
If domain-specific tree search is the rule-following robot, and MCTS is the rational alien, then neural networks are like a child.
How AlphaGo works
DeepMind trained AlphaGo the same way you might teach a toddler to recognize pictures of cats. You could show it an animal picture book, and point out all the cats. Next, you might visit a pet store together and let the kid try to pick out cats, letting them know which they get right and which wrong. Finally, you release your fledgling into the world, knowing that life will provide whatever feedback they need to correct any grievous cat-identification mistakes.
DeepMind began by feeding AlphaGo 30 million images of Go moves from strong players that it had mined from a popular online Go server. Once AlphaGo had learned to identify what a good move looks like, it practiced on images it had never seen before to get better. Finally, AlphaGo played millions of practice games against itself, getting feedback in the form of whether it won or lost.
AlphaGo also learned to estimate the odds a particular position would lead to victory by studying millions of game positions. It then reinforced that knowledge by playing millions of games against itself.
So instead of learning to recognize cats, AlphaGo can peg which moves look promising. You can visualize how AlphaGo’s neural network sees the board with a heat map. “Hotter” areas are those that the neural network thinks look the most promising.
This comes from a game I played against Leela, a strong neural network-based Go AI:
A heat map of move probabilities generated by Leela’s neural network. Hotter areas are those that the neural network predicts have a greater probability of a good player choosing. Image by author.
Having this map allows AlphaGo to concentrate on the most promising tree branches, and it makes the Monte Carlo predictions more accurate. The result is vastly more powerful strategic intelligence than prior approaches:
Chart by author. Tree search ranks are estimates based on outcomes of infrequent human-computer handicapped challenges. MCTS ranks are based on KGS records. AlphaGo ranks based on official matches. Data from Sensei’s Library and Computer Go.
AlphaGo held its first match in secret at DeepMind’s London headquarters, playing against then-reigning European champion Fan Hui. No computer had ever once beaten a professional. AlphaGo won the match 5-0.
A few months later, when DeepMind revealed its achievement of one of AI’s greatest milestones, it also announced that in just a little over one month, AlphaGo would face off against the most storied player of our generation, an elite Go master named Lee Sedol. It was a reprise of the “man vs. machine” Kasparov-Deep Blue matches.
Maybe less than meets the eye
But AlphaGo’s game records cast doubt on hopes that the AI software could win its next challenge. They revealed no creative superintelligent genius. AlphaGo, it seemed, had merely learned to mimic textbook Go extremely well.
Most conspicuous was its orthodox, cautious, and influence-oriented playing style — popular decades ago in Japan and outmoded American textbooks, but at odds with state-of-the-art play.
AlphaGo’s cautious style resulted from a bias in its training data. Website traffic analysis confirms that the English-language Go server from which DeepMind created AlphaGo’s study lessons is disproportionately popular with players from America and Japan — the very places where amateurs still play old-fashioned Go. This serves as a reminder that subtle biases in training data can utterly change a neural network’s personality — an issue that will become increasingly significant as AI increasingly comes to rely upon big data.
One moment in particular — identified by Myngwan Kim, a top Korean professional — epitomizes early AlphaGo’s uninventiveness.
Fan, playing as black, invaded AlphaGo’s territory at the bottom of the board. According to an extremely common sequence that AlphaGo must have studied countless times, white next plays A. That’s how it always goes.
“If you studied a hundred thousand games, all the games white would play A.”
Which of course AlphaGo did.
Image and animation by author. Game record from GoKifu.
The sequence is supposed to be a fair exchange — black takes white’s territory, and white gets influence toward the center.
But this time it was a mistake. You can see how black’s two stones (marked by triangles) negate white’s expected influence and threaten white’s marked stones. In this particular game, white has little to show for giving up the lower side of the board.
AlphaGo could imitate humans, but it couldn’t originate new ideas.
Unless the AI could learn to think for itself, it wouldn’t stand a chance against its next opponent, the legendary Lee.
We’re going to need a montage
In the five months following the Fan match, DeepMind programmers worked around the clock to revamp AlphaGo. DeepMind’s CEO, himself a former child chess prodigy, assembled integrated teams of researchers, engineers, and valuation experts to synthesize their assorted skills. DeepMind also hired the philosophically minded Fan, AlphaGo’s first opponent, to identify and patch up AlphaGo’s weaknesses. And AlphaGo played millions of additional practice games against itself.
There wasn’t time to fix everything, and as their deadline approached, the team was nervous. The lead researcher on AlphaGo, David Silver, reflected:
We had our evaluation match last week. We won a game, and we lost a game. And we lost a game in a way that would have made us look extremely foolish. … We have a lot of work to do. … There’s just too much risk that we could lose.
Soon enough, it was time for AlphaGo to face Lee.
One of the most creative players of the modern era, Lee is the perfect opponent for a machine. His style is intuitive, aggressive, and feared. Lee’s games play out like a Beethoven symphony — scattered fragments collide chaotically before merging into a sudden, violent climax. It’s organized mayhem.
Lee entered a Go academy at the age of 8 and began training 12 hours a day. He graduated to professional level by the time he was just 12 years old. He’s been the most dominant player of the past decade.
Across the globe, 280 million people tuned in to see how the untested computer would fare against the 33-year-old, 18-time world champion.
Within minutes, it became clear that the new AlphaGo was a different player.
Unlike its predecessor, the new AlphaGo wouldn’t back down from a fight, answering Lee’s complex challenges with surprising and clever responses. And it would wait for the ideal moment to launch startling attacks.
Time and again, I saw AlphaGo play moves that were beautiful, unexpected, and terrifying. Such moments carried an aesthetic and strategic perfection whose implications made my stomach churn.
This was nothing like the heuristic robot of the late 20th century, the rational alien of the 2000s, or the textbook excellence of its predecessor. AlphaGo had become an artist.
AlphaGo is exceptionally flexible, too.
A remarkable example comes from game 5 against Lee, when AlphaGo (white) threw away an important group of stones in the bottom-right corner in order to build a large central territory — territory that it eventually also discarded in favor of developing a huge bottom-left corner.
AlphaGo flexibly throws away stones in the bottom-right corner while gaining central territory, which it too discards. Taken from AlphaGo vs. Lee Sedol, game 5. Image and animation by author. Game record from American Go Association.
AlphaGo may be flexible in its choice of means, but it’s absolutely literal about its goal: to win.
Humans tend to equate a bigger lead with being more likely to win. (This is an example of a heuristic.) But AlphaGo is different. It doesn’t care how far it’s ahead, just how likely it is to win. If scoring points helps, fine. But if giving up a large lead can statistically improve the odds of winning from 75% to 76%, it’ll pick that route.
Planning: AI becomes curious
AlphaGo can plan for the future. And when it doesn’t know what to do, it’ll test the waters to find out — just like a human would.
In the example below, AlphaGo (white) is unsure where to play next because it doesn’t know whether its opponent wants corner territory or the outside. But AlphaGo can force its opponent to reveal its plans so that AlphaGo can respond in the optimal way.
White forces black to reveal which future it has in mind. Taken from AlphaGo vs. AlphaGo, game 1 of 50. Image and animation by author. Game record from DeepMind.
The following example actually comes from a game that a later version of AlphaGo played against itself. The density of probing and forcing moves is staggering.
White uses probes and forcing moves to winnow possible futures. Taken from AlphaGo vs. AlphaGo, game 1 of 50. Image and animation by author. Game record from DeepMind.
Incredibly, no one explicitly taught AlphaGo about experimentation or planning for the future. Inquisitive behavior is something AlphaGo learned all on its own.
I’ve just seen a face
How does AlphaGo do all this?
AlphaGo’s farsightedness, flexibility, and originality stem from its capacity for rich, detailed experience, and total freedom from executive restraint.
The world, to a neural network, is a blooming, buzzing confusion. Where we see a rabbit, a neural network sees possibilities: “80% chance of rabbit, 15% duck, and 5% hand towel.”
Our experience of optical illusions mimic what it would be like to have this kind of vision. The famous duck-rabbit illusion contains aspects of both a duck or a rabbit, and so can appear to us as either one:
Image source: Artist unknown, Kaninchen and Ente. Published 1892. Courtesy of Heidelberg University.
To peer through the eyes of a neural network would be to see a zoo in every object. A 2015 collaboration between Google and MIT managed to tease out some of the higher-level features neural networks can see. The fever-dreamlike results reveal an imagination rampant with aspect recognition.
Deep neural networks notice myriad aspects. Image source: Google Research. Used with permission.
Like a pareidoliac noticing fish eyeballs in cloaks, AlphaGo sees bizarre features and makes wild associations that would never occur to us.
Despite its astonishing abilities, AlphaGo isn’t perfect. And in game 4, Lee cracked its code. His approach evoked Muhammad Ali’s “rope-a-dope”: Protect yourself, absorb lots of punches, and wait for a critical opening to strike.
For much of the game, Lee allowed AlphaGo to bully him around, ceding small advantages to his digital opponent to ensure that his own territory was safe. Then, Lee gambled the entire game on a single, risky attack.
The strategy worked because it forced AlphaGo into a bewilderingly complex and unique situation where its pattern-recognition software couldn’t match human intuition.
Here is Lee’s move that broke the world’s most advanced AI. You can see why the shape is so unusual — the second you wedge a stone between four of your enemy’s, it’s isolated and trapped.
Move 78. AlphaGo vs. Lee Sedol, game 4. Image by author. Game record from American Go Association.
While Lee said he saw the move intuitively and quickly, AlphaGo’s pattern recognition estimated the probability Lee would play it at less than 1 in 10,000.
Now the machine became confused and unhinged. It crudely tried to rescue its formation on the right side, losing even more territory, before inexplicably tossing a stone to white’s bottom-left fortress. Altogether, AlphaGo’s meltdown lasted 12 tragicomic moves. Seeing them illustrated feels like watching a dozen own-goals in soccer:
Animation by author. Irrational AlphaGo moves marked in red. Game record from American Go Association.
Coverage of deep learning’s rapid advances has fostered in the public imagination a sense that the technology is some invincible force. But neural networks face real limitations. AlphaGo’s meltdown in game 4 reveals three such shortcomings.
First, neural networks are less efficient learners than humans. They depend on large amounts of experience, and so can fail in unusual circumstances.
Second, their missteps can be senseless and inexplicable. A Go beginner could identify the 12 moves as irrational, and the bottom-left stone as haywire.
Here’s another example: A team of Google researchers trained neural networks to write image captions with 95% accuracy. But look at how unusual some of its mistakes are. I submit to you that children are not bubblegum, and that “No Parking” signs don’t resemble refrigerators:
Image source: Vinyals et al., “Show and Tell: A Neural Image Caption Generator” (2016). Used with permission of the authors. Emphasis added.
The third problem follows from deep learning’s behaviorist approach to AI. Although DeepMind fixed the cause of AlphaGo’s meltdown in later versions, no one will ever understand how AlphaGo made its original mistake, due to neural networks’ black-box quality. We may learn to love neural networks’ predictive power, but absent explanatory abilities, we’ll be reluctant to entrust them with full autonomy.
A 2017 state-of-AI report prepared for the Department of Defense highlights some of these very issues:
The current cycle of progress in [big data and deep learning] has not systematically addressed the engineering “ilities”: reliability, maintainability, debug-ability, evolvability, fragility, attackability, and so forth.
The report continued:
Further, it is not clear that the existing AI paradigm is immediately amenable to any sort of software engineering validation and verification. This is a serious issue.
Finally, given AI’s past fits and starts, many researchers remain a bit skeptical that inflated expectations won’t give way to a new, unforseen barrier to progress.
On to the next one
For now though, the money is on deep learning continuing to make swift progress. Since defeating Lee Sedol 4-1, DeepMind has released three new versions of AlphaGo.
AlphaGo Master went 60-0 against top Go players in a series of unofficial short time-limited online games. This April, it defeated current world champion Ke Jie 3-0 in an official match.
AlphaGo Zero, unveiled in October, doesn’t even need to learn from humans. Armed with just the rules of the game and three days of practice, it beat the AlphaGo that competed against Lee 100-0 using 1/12 of the computing power, according to a paper in Nature. After 40 days of training, it beat AlphaGo Master 89-11. Curiously, the moves AlphaGo Zero developed without ever seeing a human game look even more human than the sometimes abstruse Master.
Then, in early December, DeepMind set a version named AlphaZero upon Stockfish, the top chess AI, which chess professionals use for their own training. Within four hours of learning the rules of chess, AlphaZero surpassed Stockfish, and after three days, it destroyed the formerly pre-eminent chess AI in an informal match without losing a single game.
Go may be just a game, but it expresses many of the same intellectual challenges posed by real life. DeepMind is already turning its machine-learning discoveries into an AI software that recommends medical treatments. Many others are applying AlphaGo-like techniques to diagnostics, autonomous vehicles, and chatbots, too.
AlphaGo’s capabilities and personality foreshadow the future. It’s shown us AI’s capacity for flexibility, long-term planning, and even originality, as well as relentlessness, bias, and opacity. The economic and social effects of unleashing intelligences with these traits will transform our world. In fact, they already have.
Suzanne Frey, an executive at Alphabet, is a member of The Motley Fool’s board of directors. Ilan Moscovitz owns shares of Alphabet (A shares) and Alphabet (C shares). The Motley Fool owns shares of and recommends Alphabet (A shares), Alphabet (C shares), and Facebook. The Motley Fool has a disclosure policy.