Methods & Results
Organism Selection
Organisms were selected to show a broad range of life that contains the same protein motif as Beta Hemoglobin (hemoglobin). To get an overview of the range of this motif, it was searched on the PFAM website. This website has a species distribution diagram which shows just how diverse the population of life containing this motif is (PFAM, 2011). Species were then picked using various regions of the diagram to compare and contrast sequences along the tree of life.
Species chosen were:
Sumatran orangutan (Pongo abelii)
Mouse (Mus musculus)
Chicken (Gallus gallus)
Carolina anole (lizard) (Anolis carolinensis)
Zebra fish (Danio rerio)
African clawed frog (Xenopus laevis)
Tsetse fly (Glossina morsitans morsitans)
Yeast (Candida albicans)
Nematode (Caenorhabditis elegans)
Bacteria (Vibrio mimicus)
BLAST
Utilizing the Nucleotide BLAST search and the Nucleotide database, the best result (highest similarity) for the selected organisms was obtained. This was done by selecting the option to only include genes from a specific organism. This was done out of necessity, as home PCs have difficulty showing the over 20,000 results from the standard search (NCBI, 2011). The FASTA sequence of this gene was saved for use in ClustalW.
ClustalW
The first ten letters of the previous FASTA sequences were changed to more easily denote the organism that it represents. After combining all of the FASTA sequences (including the human sequence) into the same file, it was uploaded into ClustalW. ClustalW then performed an alignment sequence. This alignment was saved in PHYLIP format within the Phylip exe folder for use in phylogenetic analysis as described below.
Phylip
The protocol used was given in the Bioinformatics class on EMUOnline and is as follows:
Obtained the Phylip suite of programs from the University of Washington Department of Genome Sciences (2011).
Seqboot (and all subsequent programs) was located in the exe folder of the Phylip suite. In Seqboot the file was opened by inputting the name of the file (including the .txt). A random seed number was entered that was about 6 digits long and ended with an odd number. The program converted the ClustalW result into an outfile used for analysis with the tree building programs. The name of this outfile was changed to avoid confusion.
Protpars was opened and the seqboot outfile opened. The settings were changed by using the command line. M was entered to denote multiple data sets with 100 being set as the number of data sets. A random seed number was given as in seqboot. O was used to denote the outgroup, in this case it was the bacterial sequence. Y was used to accept the settings and run the program. This produced another outfile, which was renamed to denote that it was from protpars.
Consense was opened to make a consensus tree from the multiple trees given in the protpars outfile. O was used to denote the outgroup. R was used so that the phylogenetic tree would be rooted (starting at one point). Y was used to accept the settings and run the program. This outfile produced the tree (after clarifying the figure using Microsoft word) is shown below in Figure 1.
Figure 1: Protpars Parsimony Analysis
Proml was used to run the same set of sequences using maximum likelihood. The instructions on obtaining an outfile and making a consensus tree are the same as for Protpars. The resulting tree is shown below in Figure 2. Figure 2 contains obvious differences to Figure 1, including where the human and orangutan branches are placed along the tree.