Week 3: Modifying the sequences by removing the overlaps and making spatial density plots of TEs across the chromosome

Last week, I generated many histograms of matching ratio of TEs in different families. However, the sequences were mostly fragmented, because there were many insertions inside TEs. Thus, I ran a python script named one-code-to-find-them-all to generate a csv file, containing all the consolidated TE sequences. From there, I calculated the new ratio that the query sequences matched the referential ones. And according to the matching ratio, I categorized intact and fragmented DNA and LTR. DNA transposons with a matching ratio higher than 90% would be considered as intact ones; otherwise, they were fragmented ones. The threshold for LTR was 50% instead. Although the data were from a consolidated TE file, there were still overlaps in the sequences. The overlaps prevented me from getting the precise density of TEs within a certain region of the chromosome. Thus, I wrote a python script to remove all the overlaps. After that, I regenerate the files for intact and fragmented DNA and LTR, and further divided them according to their families. Then I was interested in the density of the intact and fragmented TEs across the genome. That was to say, I wanted to see whether there was a pattern for the distribution of the intact and fragmented DNA and LTR transposons. Thus, I calculated the density of all the DNA and LTR inside each of the windows on the 14 chromosomes separately. I also did the same thing for every TE family. Then I wrote python programs to plot the density plots for all the categories mentioned above. Therefore, for each chromosome, I made a density plot of all DNA and another for all LTR. Furthermore, I also made density plot of each of the 6 families of DNA transposons and that of each of the 3 families of LTR transposons.

Here are some examples of the density plots:

Density_of_DNA_scaffold_1

Density_of_LTR_scaffold_1

Density_of_DNA_CACTA_scaffold_1

Density_of_LTR_Copia_scaffold_1

The next step would be to look at the divergence of the transposons which belonged to the same subfamilies.

 

 

Speak Your Mind

*