Phylogenetic Relationships using
ClustalW and Phylip
Introduction
We hypothesize that Human Beta globulin will be well conserved throughout mammals, birds and amphibians; with mammals and amphibian being more closely related to birds, and most distant from each other. The furthest from Human Beta globulin will be the nematode, yeast, and bacterial genes.
Finding Similar Genes
Nucleotide BLAST compares a sequence of interest to an entire database and shows the user the most similar sequences. This comparison is done by "regions of local similarity" which then are statistically analyzed with the most relevant searches having the highest amount of similarity (NCBI, 2011).
Comparing Gene Sequences
ClustalW is used to align sequences based on similarity. It is used to sort out regions of similarity in an efficient manner. This alignment can then be used for analysis of phylogenetic relationships (EMBL-EBI, 2011)
Analysis of Phylogenetic Relationships
PHYLIP is a suite of programs for phyologenetic analysis. Four of the programs are SEQBOOT, PROTPARS, PROML, and CONSENSE. SEQBOOT produces data sets by bootstrap resampling. The SEQBOOT outfile is used to create phylogenetic trees with PROTPARS and PROML. CONSENSE uses a majority-rule consensus tree method to find a consensus tree as a final step (University of Washington, 2011).
PROTPARS uses parsimony method to generate phylogenetic trees. Parsimony method bases the trees on nucleotide changes that affect the amino acid sequence. The assumption is made that silent changes (changes in which amino acid sequence is not changed) are more common and not as relevant.
Formal assumptions of this method are:
-substitutions, deletions, and insertions (change) in different sites occur independently
-change in different lineages occur independently
-changes that involve change to amino acid sequence occur less frequently
-expected change rates in different branches of a phylogenetic tree do not differ by more than 1:2
-expected change rates do not differ enough among nucleotides that changes in one site are more likely than one in another
-the probability that a base change will occur that will not alter the sequence is more likely than one that will (Felsenstein, 2011)
PROML uses maximum likelihood method to generate phylogenetic trees. The program infers different rates of mutation for each nucleotide. These different rates of change are integrated to form a tree.
Assumptions for PROML are that:
- each nucleotide in the sequence evolves independent of the others
-different lineages evolve independent of the others
-a nucleotide in each position undergoes an expected change rate
-all nucleotide positions are included in the sequence
-probabilities of substitution are as expressed in several studies (Felsenstein, 2011)
Maximum likelihood is considered advantageous over parsimony analysis in most scenarios. Parsimony has a possibility of producing unlikely trees when rates of evolution are different in the branches of the tree (Felsenstein, 1981). For this reason, consensus trees from both parsimony and maximum likelihood analysis will be compared.
PHYLIP is a suite of programs for phyologenetic analysis. Four of the programs are SEQBOOT, PROTPARS, PROML, and CONSENSE. SEQBOOT produces data sets by bootstrap resampling. The SEQBOOT outfile is used to create phylogenetic trees with PROTPARS and PROML. CONSENSE uses a majority-rule consensus tree method to find a consensus tree as a final step (University of Washington, 2011).
PROTPARS uses parsimony method to generate phylogenetic trees. Parsimony method bases the trees on nucleotide changes that affect the amino acid sequence. The assumption is made that silent changes (changes in which amino acid sequence is not changed) are more common and not as relevant.
Formal assumptions of this method are:
-substitutions, deletions, and insertions (change) in different sites occur independently
-change in different lineages occur independently
-changes that involve change to amino acid sequence occur less frequently
-expected change rates in different branches of a phylogenetic tree do not differ by more than 1:2
-expected change rates do not differ enough among nucleotides that changes in one site are more likely than one in another
-the probability that a base change will occur that will not alter the sequence is more likely than one that will (Felsenstein, 2011)
PROML uses maximum likelihood method to generate phylogenetic trees. The program infers different rates of mutation for each nucleotide. These different rates of change are integrated to form a tree.
Assumptions for PROML are that:
- each nucleotide in the sequence evolves independent of the others
-different lineages evolve independent of the others
-a nucleotide in each position undergoes an expected change rate
-all nucleotide positions are included in the sequence
-probabilities of substitution are as expressed in several studies (Felsenstein, 2011)
Maximum likelihood is considered advantageous over parsimony analysis in most scenarios. Parsimony has a possibility of producing unlikely trees when rates of evolution are different in the branches of the tree (Felsenstein, 1981). For this reason, consensus trees from both parsimony and maximum likelihood analysis will be compared.