When at first you don’t succeed…

One of the most frustrating things about science is that you spend a lot of time wondering if you’re headed down the right path. Although having multiple ways of doing things can be a benefit, it can also lead to second-guessing. I have spent more time than I care to admit thinking about whether the method I’m currently using is the best way to achieve results. Unfortunately, the past few weeks have been a lot of trial and error. I spent a week trying to get my data in the correct format in order to use a package on it (PAML; Phylogenetic Analysis by Maximum Likelihood) only to realize that the package wasn’t ideal for the low read coverage data I’m working with. This meant I was forced to jump ship and start trying to figure out a completely new way of doing things.

After consulting with my advisor we decided to use Perl and Python scripts to process our genomic data (an example is pictured). This had the benefit of allowing me to skip some of the data processing steps as we could tailor the scripts to the original data. Of course, the disadvantage to this was that we need to write our own code. Our code was much more simplistic than the PAML package, however, it was more suited to our data and we could control the output.

Another benefit to using our own code was that we could write it in a way that made it easier to apply to all of our files. I have approximately 46,000 gene files which meant I needed to automate the process. This was the easiest part of the whole process. I wrote a very simple shell script to run the Perl script on all the 46,000 gene files. Now all that’s left is actually analyzing the data.

Screen Shot 2018-07-10 at 3.17.55 PM