Saturday, June 18, 2011

Molecular Phylogenetics Workshop

Next week I will be running a short Molecular Phylogenetics Workshop at Roan Mountain State Park in Tennessee (June 22, 10:30-2:00). The workshop coincides with the meeting of the American Bryological and Lichenological Society, but I will be presenting general principles of molecular evolution and phylogenetic inference that are applicable to any set of organisms.

Here is the abstract:
"Cryptogams are notorious for their paucity of morphological characters when compared with higher plants and animals. As a result, an understanding of molecular data and what they can reveal in terms of evolution is perhaps more crucial in these organisms than in many others. Workshop participants will explore principles of molecular phylogenetics and learn basic protocols for running phylogenetic analyses. The main objectives will be (1) to promote an understanding of how events in the course of molecular sequence evolution affect phylogenetic inference, (2) to explore the advantages and disadvantages of different phylogenetic methods, and (3) to facilitate sound research into the phylogenetic history of life. The workshop will include both lecture and discussion. Participants are invited to bring their own data sets for more detailed evaluation at the conclusion of the workshop."

For those scheduled to attend, I look forward to seeing you there! For those not attending, I hope to see you at a future workshop!

- Brendan

----------------------------------------------

This work was was made possible in part by NSF (DEB-1011504) and the American Bryological and Lichenological Society.

Friday, June 17, 2011

Writing a Phycas Script

For a while I have been wary of phylogenetic results supported only by Bayesian analyses, because of the so-called 'star-tree paradox' that haunts MrBayes and even some other programs like it. As I have mentioned previously, one of the best features of the Bayesian phylogenetic program Phycas is that it gives one the opportunity to allow polytomies in the trees sampled as part of the posterior (which can often deflate the inflated posterior probability values seen with programs like MrBayes). The specific command for this is:
"mcmc.allow_polytomies = True"

To run Phycas, it is best to write a script to go with a standard NEXUS-formatted sequence alignment. There is some basic information on how to install and run Phycas in my previous post:
http://squamules.blogspot.com/2011/06/installing-and-running-phycas.html
However, that post does not go into any of the details of scripting for Phycas. Recently, I ran a multigene analysis with mtSSU, ITS1, 5.8S, and ITS2 in different partitions, with a different evolutionary model for each. Here is what my Phycas script looked like:

from phycas import *
setMasterSeed(98765)
mcmc.data_source = 'Input_file_name.nex'
mcmc.out.log = 'Output_file_name.log'
mcmc.out.log.mode = REPLACE
mcmc.allow_polytomies = True
mcmc.polytomy_prior = False
mcmc.topo_prior_C = 1.0
mcmc.out.trees.prefix = 'Output_file_name'
mcmc.out.params.prefix = 'Output_file_name'
mcmc.ncycles = 50000
mcmc.sample_every = 10
# Set up the K80+I model for 5pt8S
model.type="hky"
model.state_freqs = [0.25, 0.25, 0.25, 0.25]
model.fix_freqs = True
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.pinvar_model = True
# Save the K80+I model for 5pt8S
m3 = model()
# Set up the GTR+I model for mtSSU
model.type="gtr"
model.state_freqs = [0.3338, 0.1493, 0.1983, 0.3187]
model.fix_freqs = False
model.relrates = [1.4783, 5.8050, 3.3222, 0.6768, 7.6674, 1.0000]
model.pinvar_model = True
# Save the GTR+I model for mtSSU
m1 = model()
# Set up the HKY+G model for ITS1
model.type="hky"
model.state_freqs = [0.1487, 0.3566, 0.2704, 0.2244]
model.fix_freqs = False
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.num_rates = 4
model.gamma_shape = 0.5
model.gamma_shape_prior = Exponential(1.0)
model.pinvar_model = False
# Save the HKY+G model for ITS1
m2 = model()
# Set up the HKY+G model for ITS2
model.state_freqs = [0.1419, 0.3069, 0.3199, 0.2314]
# Save the HKY+G model for ITS2
m4 = model()
# Define partition subsets
mtssu = subset(1, 1080)
its1 = subset(1081, 1607)
fivept8S = subset(1608, 1768)
its2 = subset(1769, 2041)
# Assign partition models to subsets
partition.addSubset(mtssu, m1, "mtSSU")
partition.addSubset(its1, m2, "ITS1")
partition.addSubset(fivept8S, m3, "5pt8S")
partition.addSubset(its2, m4, "ITS2")
partition()
# Start the run
mcmc()
# Summarize the posterior
sumt.trees = 'trees.t'
sumt.burnin = 500
sumt.tree_credible_prob = 1.0
sumt()

Although I have some notes within the script, please see the Phycas manual for instructions on what each of the individual commands does. Hopefully more people will be using Phycas (and allowing polytomies!) in the future!

- Brendan

Note: One question that I had about running Phycas was how to define exclusion sets; however, Phycas apparently can read the EXSET line of the ASSUMPTIONS block of the NEXUS file in the same way that Mesquite, MacClade, and PAUP* can.

Tuesday, June 14, 2011

Installing and Running Phycas

Phycas has recently earned a high spot on my short list of favorite computer programs for phylogenetics. Phycas is the amazing program that can run a Bayesian phylogenetic inference without being susceptible to the 'star-tree paradox' because it allows for the existence of polytomies in the sampled trees.

From an academic perspective, Phycas is actually a pretty easy program to run and install. Still, some additional notes on tricks and tips for running it were beneficial to one of my colleagues who was really having trouble getting it to go. Here were my instructions for installing Phycas on a Windows machine:

1) Install Python 2.7. [I use the Enthought Python Distribution, available here: http://www.enthought.com/products/epd.php. Everything is bundled together so components like SciPy, NumPy, etc., never need to be installed individually and the different versions of the components are all guaranteed to play well together.]

2) Follow the instructions here:
http://hydrodictyon.eeb.uconn.edu/projects/phycas/index.php/Telling_Windows_where_to_find_Python
to append Python27 (different from the versions they have listed there) to your PATH (I guess if your PATH is truly empty then you will just leave out the semi-colon; otherwise, keep whatever's already in your PATH in there, but just add ;C:\Python27 to the end of it it). [It might also be important to make sure that the PYTHON-STARTUP environmental variable says C:\Python27 (if you have that variable... mine was still set to 2.6, meaning that the wrong version of Python would likely open up by default), and that this is all being done for the system level and the user level... I was only doing it for the user level for a while and it got me mixed up.]

3) Do the 4-step Phycas installation as outlined on the "Windows XP/Windows Vista/Windows 7" section of this website (the manual itself is apparently wrong, so be careful here).

4) For your own particular analysis, put the NEXUS-formatted alignment file ('.nex') and the phycas script file ('.py') on the Desktop where you have the shortcut to the '.bat' file. [For more on writing a Phycas script, stay turned to this blog!]

5) Drag and drop the phycas script file (.py) onto 'Shortcut to phycas.bat'.

I'll have another blog post that goes more into the details of Phycas scripting, but I hope this post helps jump-start some of those eager to deflate their inflated posterior probability values!

- Brendan

Monday, June 13, 2011

A New Home

I have relocated and am finally settled in New York! I will be doing my post-doctoral work at the New York Botanical Garden (NYBG) using advanced bioinformatics tools to study lichen ecology and evolution with Richard C. Harris and James C. Lendemer. Here is my new address:

International Plant Science Center
The New York Botanical Garden
2900 Southern Blvd.
Bronx, NY 10458-5126

NYBG has an amazing research program that is not typical for a botanical garden. The departments within the International Plant Science Center include the following: Institute of Systematic Botany, Cullman Program for Molecular Systematics, Plant Research Laboratory, NY Plant Genomics Consortium, Steere Herbarium, Graduate Studies, Mertz Library, and NYBG Press (plus more). My research will likely connect in some way with all of these departments, which is why NYBG is a perfect environment for my post-doctoral research.

Those who have been following my research closely might say "obviously you study diverse organisms from across the tree of life, but since when do you study plants?" Since the concept of "plants" once included fungi, the "plant" research program at NYBG (which began taking shape over 100 years ago) provides amazing resources for the study of fungal biology as well. Of course, I will specifically be focusing my energy on the lichen-forming fungi.

Please stay tuned for more on my research as it unfolds in New York City!

- Brendan