|
CEMC - Multiple Protein Structure Alignment Server
The Monte Carlo Algorithm in a Nutshell
Monte Carlo algorithm is based on random
numbers. Global optimization of the alignment is accomplished by random
and iterative exploration of the search space with occassional excursions
into the non-optimal territory.
Initially, pair-wise alignments are calculated for all protein chains in all-to-all combination using CE algorithm. Multiple seed alignment is built by piling up Master-slave pair-wise alignments at 0-approximation and this alignment is iteratively refined using Monte Calro optimization method as follows.
A distance-based score is calculated
for the initial alignment. A set of trial moves
are designed to move the residues in the alignment in forward or backward
directions. The type and position of a trial move are determined by random numbers. Trial moves are performed either one residue at a time or one column at a time and new score is calculated for each trial move. If the score improves upon the trail move, the move is always accepted and the change in the alignment becomes permanent. If the score deteriorates,
the move may still be accepted or rejected based on a factor P which depends
opon the extent of score deterioration and the trial move count. This process
is iterated until the alignment converges i.e., there is no further improvement in the score for a pre-determined number of trials.
The alignment results are presented
in a variety of formats as follows
-
JOY
-
FASTA
-
JOY/post-script
-
TEXT
Examples (in JOY
format)
Master-slave
(also seed) alignment obtained with CE algorithm
Improved
alignment with Monte Carlo algorithm

Z-score Cutoff
This parameter is used to remove structures that are very different from the rest of the structures in the set. Z-scores >4 are generally recommended, however the program accepts Z-scores between 2-8. Higher the Z-score cutoff, closer structural neighbours retained in the set and vice-versa.
Distance Cutoff (di)
During MC optimization, often the alignment distance also increases as a
result of improving score. However, improvement in the score is meaningful only
if the alignment distance does not shoot up rapidly. Hence, there is always a
trade-off between these two factors. This parameter (di)
keeps a check on the distance increase upto 3 times the initial alignment distance . At lower cutoffs, the local alignments are better, but global alignments may be stretched. At higher cutoffs alignments are compressed, but need not be structurally similar. DO NOT use >2x for diverse sets.
PDB IDs
PDB IDs should also include the chain id. For example, here is a set of PDB IDs belonging to protein kinase family.
1CDK:A 1CJA:A 1CSN:_ 1B6C:B 1IR3:A 1FGK:A 2SRC:_
|