Google’s DeepMind aces protein folding

Complex of bacteria-infecting viral proteins modeled in CASP 13. The complex contains four separate subunits that were modeled individually.

Protein Data Bank

Turns out mastering chess and Go was just for starters. On 2 December, the Google-owned artificial intelligence firm DeepMind took top honors in the 13th Critical Assessment of Structure Prediction (CASP), a biannual competition aimed at predicting the 3D structure of proteins.

The contest worked like this: Competing teams were given the linear sequence of amino acids for 90 proteins for which the 3D shape is known but not yet published. Teams then computed how those sequences would fold. Though London-based DeepMind had not previously joined this competition, the predictions of its AlphaFold software were, on average, more accurate than those of its 97 competitors.

How close was the race? By one metric, not very. For protein sequences for which no other information was known—43 of the 90—AlphaFold made the most accurate prediction 25 times. That far outpaced the second place finisher, which won three of the 43 tests.

So AlphaFold lapped the competition? Well, not exactly. When you track how much AlphaFold won or lost by in each case, the results look much closer. That’s shown in the graph below. It shows AlphaFold’s performance on the vertical axis and that from the best other group on the horizontal axis. Points above the red line show predictions where AlphaFold won. Points below, it lost. And those on the red line were essentially a tie. The upshot? AlphaFold won a lot of rounds, with an average margin of 15% accuracy improvement over other groups on the toughest 43 tests, says John Moult, CASP’s lead organizer and a computational biologist at the University of Maryland in Rockville.

Ready, set, fold!

Points above the red line show protein-folding predictions where AlphaFold won. It lost those below the line. Those on the line were essentially a tie.

0255075100Other top competitors0255075100DeepMind’s AlphaFold0255075Data: abcdefg hijkl mnop qrstu vwxyz 1234 56789​/Science​Data: Andriy Kryshtafovych, U.C. DavisData: abcdefg hijkl mnop qrstu vwxyz 1234 56789

Andriy Kryshtafovych/University of California, Davis

So, what was going on? David Baker, a CASP organizer, participant, and computational modeling expert at the University of Washington in Seattle, notes that DeepMind’s scientists built on two algorithm strategies pioneered by others. First, by comparing vast troves of genomic data on other proteins, AlphaFold was able to better decipher which pairs of amino acids were most likely to wind up close to one another in folded proteins. Second, related comparisons also helped them gauge the most probable distance between neighboring pairs of amino acids and the angles at which they bound to their neighbors. Both approaches do better with the more data they evaluate, which makes them more apt to benefit from machine learning computer algorithms, such as AlphaFold, that solve problems by crunching large data sets. DeepMind scientists “are extremely good at machine learning and have a superb team” with deeper pockets than most academic groups, Baker says.

Still, not bad for a newbie. “Give them credit,” adds John Moult, another CASP organizer and a computational biologist at the University of Maryland in Rockville. “They came from nowhere.”