Week 5: Modifying the density plots and grouping the TEs by family when conducting the alignment

Last week, I did the alignment of transposons using a series of programs. I had got the alignment file and the distance matrix for each DNA subfamily. This week, I continued on this track but modified the grouping of the TE families. Instead of grouping them by the really tiny unit–subfamily, I regrouped them by their family. There were only six major families of DNA transposons. In this way, we could now see a phylogenetic hierarchy within the families.

I first sampled each of the subfamilies, since the number of transposons was too large to be plotted in a phylogenetic tree. I wrote a python script to do that. Then I ran MUSCLE, the alignment tool I used before. However, it encountered many errors and failed to generate the alignment files for most of the TE families. Then I had to turn to other tools. I did some research and found a multiple alignment program named emma. I tried to run the program on the combined sequence files of each family and it worked perfectly well. The output documents did not only contain the alignment file. It also had a tree file which gathered the transposons with smaller divergence by parentheses. To visualize the tree, I used another program named drawgram in PHYLIP package. It turned the tree file from emma to a real phylogenetic tree.

Here is one example of the phylogenetic tree.

Figure 1: The phylogenetic tree of DNACACTA.

Figure 1: The phylogenetic tree of DNACACTA.

The other thing I did this week was I modified the code to remove the overlaps of TE sequences when calculating the density distribution across the chromosome. I found that the previous density plots had densities on the y axis higher than one. This was obviously unreasonable for a density plot. Thus, I went back to search for errors in the code to remove the sequence overlaps and fixed them. After that, I regenerated all the density plots and they looked much better.

Next step would be to interpret the phylogenetic trees and make good inference from them.

Speak Your Mind