But that’s not easy. Imagine that stretchy length of elastic covered in magnets again. And now imagine, looking at the sequence of magnets, trying to work out in advance what shape the elastic would form when it bunched up. Working out that shape from the sequence is known as the “protein folding” problem, and it has proved to be extremely difficult.
So in 1994, the CASP (Critical Assessment of protein Structure Prediction) challenge was set up. Every two years, teams would compete to make the best prediction of various proteins’ shapes from their sequences alone. The shape of the proteins being assessed would be currently unknown but in the process of being researched – which meant that the teams couldn’t cheat, but that their work could be assessed against experimental results.
The CASP programs were assessed on a scale of 0 to 100. If a program scored 100, it would mean that it predicted where every single atom was, correct to within one angstrom – that is, about one atom’s width. If it scored 0, it meant that it was completely wrong. The CASP assessors said that the target was scoring 90 or above, on average, across all the proteins being assessed; 90 is arbitrary, but crystallography experiments can’t do that much better.
From 2006 to 2016, the best program at each competition managed somewhere around 30 or 40 on that scale. Then in 2018, AlphaFold scored almost 60. And, now, it has achieved 92, including over 87 on the very hardest proteins.
This might sound a bit blah. They scored X, now they scored a bit more than X. But it is a huge advance. “There are still a lot of questions to ask,” says Ewan Birney, deputy director-general of the European Molecular Biology Laboratory (EMBL). The scientific community will want to “kick it around a bit” – although, he says, the CASP assessment is incredibly rigorous, so it’s almost certainly accurate. And there are further layers to the problem – proteins that don’t form globular shapes; proteins that change shape. “But that shouldn’t diminish what they’ve done. This is a 50-year-old problem, and the AlphaFold team has made a real massive change, a phase change.”
The implications for biological science are obvious. If you can predict the shape of a protein in a few hours rather than a few months – and the AlphaGo program runs on relatively modest resources, by supercomputer standards, in “a couple of days” – you can uncover potential targets for drug discovery much more quickly. Samant points out that you would still need to check whether your predictions are accurate, but it’s much, much easier to use crystallography to see whether a protein is the shape you think it is than to work out the shape from scratch.
That’s the near term. In the longer term, you can understand how the body works in far greater detail. “It’s not that AlphaFold suddenly understands how a human works,” says Birney, “but the tide has gone up by a massive level.” Professor Dame Janet Thornton, a pioneer of protein research who has been working on the folding problem for nearly 50 years, told a briefing held by the Science Media Centre ahead of the announcements that possible future applications could be designing enzymes that consume plastic waste, or that suck carbon out of the atmosphere, or improve crop yields.
Similarly, it could help us understand diseases like Alzheimer’s, which seems to be something to do with protein misfolding, as is Parkinson’s. Protein misfolding appears to play a role in the development of some cancers. And DeepMind hopes it will have a role in future pandemic responses: AlphaFold was able to predict the shape of a protein, ORF3a, in SARS-Cov2, as well as other coronavirus proteins. Understanding the shape should help make the discovery of future drugs and vaccines quicker. It is a sudden window into areas of basic science which were simply not visible before.
What fascinates me, though, is the AI angle. Birney made an interesting point when we spoke: that when DeepMind started out, they set out to make a single program that could play lots of Atari games. “People said, ‘you’re just playing silly games.’” Then they made a program that could become superhuman at Go, a fantastically deep and complex game, but nonetheless a game. Then they used essentially the same program to become enormously superhuman at chess. Now, a similar architecture – as far as I can tell, at least – can solve real scientific questions.
Hassabis, in the SMC briefing, compared the AlphaGo and AlphaFold breakthroughs by saying that the two both relied on something like human insight. With chess, there are something like 35 possible moves per turn, so to look ahead two moves you need to examine 35 x 35 moves (1,225); to look ahead five moves, it’s 52 million. With a powerful computer, you can do this kind of “brute-force” computing for a few moves, although chess programs still need to be intelligent as well as powerful.
But with Go, there are something like 200 possible moves per turn, and brute force is much less useful. Human Go players rely much more on intuition than human chess players do – this board position feels strong, in some wordless and ill-defined way; this board position feels weak. AlphaGo developed some sort of equivalent to this insight; it worked out what board positions and moves felt strong, with some kind of high-level pattern recognition, from playing hundreds of millions of games against itself.
It seems to have done something similar with protein folding. Again, computationally, it’s impossible to calculate every possibility; it’s too complex. But humans have turned out to be quite good at using their intuition to determine how proteins fold: some people became extremely good at the online computer game FoldIt, in which players tried to work out the shape of a protein from its sequence. There seem to be deep patterns that humans can pick up on, and AlphaFold can pick up on rather better. It learned this intuition by training on 170,000 known proteins and their sequences, in the same way that AlphaGo learnt from millions of games.
Birney points out that AlphaGo started to play in ways that no human would play, but in which Go champions could then see the logic and beauty, so they could learn from it. Similarly, he says, the AlphaFold deep learning system “has come up with insights that were not obvious to humans”. And unlike Go, which is a closed, human-designed system, protein folding “is a game where the universe sets the rules”.
This is scientific creativity: it is spotting patterns in the universe, working out how things are connected. It is, I think it is fair to say, a computer that is doing science. Whether it’s the first time that’s happened is obviously a question of definitions, and I’d probably say that it isn’t: big data and AI have been used to come up with hypotheses for some years now. But it’s another strike against anyone who says that computers can’t have “true intelligence” or “creativity” or whatever.
And this is why I’m obsessed with DeepMind. In the briefing, Hassabis casually dropped in that DeepMind’s “ultimate vision has been to build general AI, and to use it to help us better understand the world around us by accelerating the pace of scientific discovery”. General AI, for those who don’t know the terminology, is an AI that can do any intellectual task that a human can do, as opposed to “narrow” AIs which can, for instance, play chess, but couldn’t then do your taxes. It’s the AI of sci-fi: AI that can hold a conversation and sort your calendar and plan a satellite launch and balance the defence budget.
Most AI researchers instinctively shy away from big talk like that; Hassabis and DeepMind explicitly went for it, from Day 1. Lots of people worry about general AI destroying the world; most AI researchers sort of pretend that that couldn’t happen. But I once spoke to someone at DeepMind who simply said “Sure, that could happen. But we’ll make sure it doesn’t. We’ll make general AI, and it will be awesome.”
They haven’t achieved general AI with AlphaGo. But given that their system, with what seem (to my inexpert understanding) to be relatively minor changes, can become massively superhuman at chess, Go, StarCraft II, Atari games and now at the protein-folding problem, it is becoming increasingly inaccurate to refer to it as “narrow”, as well.
It’ll be interesting, now, to see how DeepMind use this technology — presumably they’ll want to monetise it, and drug discoveries are (as we’ve seen recently) lucrative things. But it’ll be even more interesting to see whether this is just the start of an era of computers doing science. I’m obsessed with DeepMind because they might actually bring about the AI future they promise. The technological singularity and Theme Park: it’s quite the legacy to leave the world.
Join the discussion
Join like minded readers that support our journalism by becoming a paid subscriber
To join the discussion in the comments, become a paid subscriber.
Join like minded readers that support our journalism, read unlimited articles and enjoy other subscriber-only benefits.
Subscribe