Protein Folding Pathways
Yogesh A. girdhar
I developed the following simulator to simulate protein folding pathways. Given a protein in its native configuration, the simulator generates thousands of possible folding pathways(from different initial configuration) which the protein takes while it is folding. This program has applications in molecular drug design and studying diseases which are caused due to protein mis-folding.

The simulator is based on Probabilistic Roadmaps (PRM) method of motion planning. Its working can be divided into the following 3 steps:
Sample Generation - Generating c-space
Graph Generation - Connecting c-space
Pathways Generation - Finding pathways.
Sample Generation
Samples are generated by the following procedure:
Input: native PDB file for the protein.
Approximate the side chains with a CB carbon atom.(this is done for speed)
Build a atomgroup[3] model of this molecule so that we can change the torsion angles and get new configurations.
Generate a sample by randomly changing the torsion angle using a normal random number distribution biased around native configuration.
If any of the CB atoms in the sample are closer than 2.4A then discard the sample.
Calculate the energy of the sample[1,2].
Repeat till required number of samples have been generated.
Graph Generation
Once we have the samples (c-space), we now try to connect them as following:
For each samples s in c-space (this is the source vertex in the edge)
find K-nearest neighbors.
For each neighbor n (this is the target vertex in the edge)
calculate deltaE = Energy(n) - Energy(s)
if(deltaE < 0) deltaE = 0 (deltaE is the edge weight)
output Edge(s,n,deltaE)
Distance Matrix Error is used as the distance function above. Another possible function could be euclidean distance in c-space.
DME is calculate as:
DME = Sum( | a_ij - b-ij | ) / (N*(N-1)/2)
a_ij is the distance between atom i and
atom j in configuration a.
b_ij is the
distance between atom i and atom j in configuration
b.
N is the number of atoms
Path Generation
One we have the Graph(list of edges and vertices), we simply run Dijkstra's single source shortest path algorithm to get all possible paths leading to the native. Dijkstra's algorithms minimizes the edge weights; as a result we get paths with lowest energy climb.
I ran the simulator on 2GB1 protein fragment with the following parameters:
Here are some randomly chosen output paths(you need VMD to see these):
1 2 3 4 5