How can laptops fold proteins?
In December 2020, Google Deepmind wrote a computer program to accurately predict the 3D structure of proteins from their gene sequences.
Understanding this requires answering three questions:
- Can I explain to a string of amino acids how they should fold into proteins? (For this question, atoms speak English)
- Can a computer understand my explanation? (Computers only understand numbers: they do stuff to numbers to make other numbers)
- Isn’t it more complex than that? (spoiler alert, the answer is “yes”)
1: Explain protein folding to a string of amino acids
Good morning, dear amino acids. You are lined up, having just been translated by the ribosome. You feel electromagnetic forces attracting you to other amino acids in the line (because your negatively charged electrons attract their positively charged protons). Move yourselves towards the amino acid to whom you feel most attracted, without breaking the line. You might get pushed out the way by other amino acids who are more attracted to their destinations. Eventually, dear amino acids, most of you will not feel attracted to anywhere more than your current position. You may also notice that your starting line has twisted into a sophisticated knot.
Congratulations, amino acids, for folding into a protein.
2: Explain protein folding to a computer
Each amino acid in the starting line can be described by three numbers: one for its position in the line, one for its position in 3D space, and one for its ‘attractiveness’ to other amino acids.
In chemistry, as mentioned before, it’s the oppositely charged nature of electrons and protons that cause attraction between molecules. Therefore, the ‘attractiveness’ number given to each amino acid will be the amount of charge on it, which can be positive or negative.
Amino acids feel more attracted if they are closer together, and if they have a larger charge difference.
Therefore, I will tell the computer to calculate an ‘attraction score’ between each possible pair of amino acids. This will be a fraction: the product of the amino acids’ charges divided by the distance between their positions in 3D space.
Now, the computer must fold the amino acid string into a protein. This means changing the numbers that describe amino acid positions in 3D space, while updating the attraction scores (because attraction scores depend on position).
How does the computer decide where each amino acid should move?
Each amino acid should move towards the other amino acid to whom it is most attracted. Once an amino acid has moved towards its most attractive other amino acid, the attraction score between them will get MUCH BIGGER (because their distance apart just decreased). So much bigger, in fact, that if I added up all of an amino acid’s attraction scores, this total would get bigger after the move.
This is brilliant news for my computer:
All I need to ask my computer is “please put amino acids into the arrangement which gives the biggest sum of all attraction scores for all amino acids added together (remember not to break the starting line though!)”.
Congratulations, dear computer. Your new list of amino acids’ positions in 3D space tells me the structure of a protein.
3: Isn’t it more complex than that?
Yes. Real life is really complicated! There are a few more points that a laptop must consider when folding proteins. Here they are:
Amino acids aren’t just allowed to make any protein they like. There are many different proteins that can be folded using the same string of amino acids, however, each human gene only codes for ONE protein shape.
Amino acids don’t just have one charge: they are big molecules; different bits of amino acid molecules have different charges. This means that the computer might actually need several ‘attractiveness’ numbers per amino acid.
Are there other reasons why chemicals attract each other apart from having opposite electrical charge? If so, the calculation of ‘attractiveness scores’ may need modifying!
It’s possible – but definitely tricky – to decide how to write the positions of amino acids in 3D space as numbers. It’s also tricky to teach the computer how to fold the amino acid string without breaking it.
It’s easy enough to explain maths to a computer using English but computers don’t speak English. They speak strangely named languages such as Python, Java, R and C-sharp, when words are typed into a document similar to Microsoft Word, with some extra buttons. There isn’t a Google-Translate for English-to-Computer-language yet, and I am definitely hoping for one in the future!
I hope that you enjoyed reading this article, but… it ends with a twist: Google Deepmind didn’t actually use any of this stuff to make their protein folding code at all! Instead, they made an artificial intelligence program called a neural network…