Minerva Gen*NY*Sis Center for Excellence in Cancer Genomics
University at Albany, State University of New York UAlbany Home UAlbany Site Index UAlbany Search
The Guda Lab
CEMC Home
DMAPS Home
References
Download
Contact
Credits
FAQ's
 


CEMC - Multiple Protein Structure Alignment Server


The Monte Carlo Algorithm in a Nutshell

Monte Carlo algorithm is based on random numbers. Global optimization of the alignment is accomplished by random and iterative exploration of the search space with occassional excursions into the non-optimal territory.

Initially, pair-wise alignments are calculated for all protein chains in all-to-all combination using CE algorithm. Multiple seed alignment is built by piling up Master-slave pair-wise alignments at 0-approximation and this alignment is iteratively refined using Monte Calro optimization method as follows.

A distance-based score is calculated for the initial alignment. A set of trial moves are designed to move the residues in the alignment in forward or backward directions. The type and position of a trial move are determined by random numbers. Trial moves are performed either one residue at a time or one column at a time and new score is calculated for each trial move. If the score improves upon the trail move, the move is always accepted and the change in the alignment becomes permanent. If the score deteriorates, the move may still be accepted or rejected based on a factor P which depends opon the extent of score deterioration and the trial move count. This process is iterated until the alignment converges i.e., there is no further improvement in the score for a pre-determined number of trials.

The alignment results are presented in a variety of formats as follows

  • JOY
  • FASTA
  • JOY/post-script
  • TEXT

Examples (in JOY format)

Master-slave (also seed) alignment  obtained with CE algorithm

Improved alignment with Monte Carlo algorithm



Z-score Cutoff

This parameter is used to remove structures that are very different from the rest of the structures in the set. Z-scores >4 are generally recommended, however the program accepts Z-scores between 2-8. Higher the Z-score cutoff, closer structural neighbours retained in the set and vice-versa.


Distance Cutoff (di)

During MC optimization, often the alignment distance also increases as a result of improving score. However, improvement in the score is meaningful only if the alignment distance does not shoot up rapidly. Hence, there is always a trade-off between these two factors. This parameter (di) keeps a check on the distance increase upto 3 times the initial alignment distance . At lower cutoffs, the local alignments are better, but global alignments may be stretched. At higher cutoffs alignments are compressed, but need not be structurally similar. DO NOT use >2x for diverse sets.


PDB IDs

PDB IDs should also include the chain id. For example, here is a set of PDB IDs belonging to protein kinase family.

1CDK:A 1CJA:A 1CSN:_ 1B6C:B 1IR3:A 1FGK:A 2SRC:_