Analysis of Milkweed Genotyping

With just a few individuals left to send off as redos, most of the focus on the milkweed project has moved towards analysis. Analysis is inherently tricky because there is a lot of trial and error. There are essentially a few options for analysis. You can use someone else’s software, you can modify someone else’s to fit your needs, OR you can make your own. The last one is obviously extremely time intensive, so our hope was to find a software we could use to fit our needs.

The first step to this is to find a few options. To do this I read many many many journals, from projects similar to ours to projects just barely related. I then had a list of different software programs, and packages for these software programs. Next I did research on them, finding the pros and cons, costs, and their capabilities. We wanted to calculate a few very specific statistical variables, so this was helpful to eliminating some programs from this list. We also did not rule out the need to use multiple programs. Finally we downloaded a few different options.

After downloading different software options we began formatting our data into input files. Some software programs have “read me” manuals and example input files making our transition to actually using the software very easy. Others though had little guidance requiring a ton of trial and error. This part of the process was mildly frustrating as we would spend a few days sometimes getting input files to work. Even after this, as we explored more of the software it sometimes became apparent that it was not a good fit for us. For instance one software program did not work with missing data well. Another had weird rounding challenges. This sometimes felt then that we had wasted our time. However, in reality, it allowed us to discover which software was the best fit.

Many hours and many tries later, we finally had some analyzed results. One of the more exciting parts of this was to finally develop charts, graphs, and tables off of our own data!! It was so exciting and rewarding to see what our work demonstrated. Another rewarding aspect was calculating specific variables. Some of these were:

Fij – the average relatedness of individuals vs their specific distance and in comparison to the overall relatedness of the entire population.

Fis- a measure of inbreeding in a subpopulation through genetic similarity

Pgen- prob of any individual existing due to sexual reproduction

Psex- probability of two individuals having the same genotype from sexual reproduction

He- expected heterozygosity

H0- observed heterozygosity

All of these measures help to answer our overall goal and questions of quantifying genetic diversity of milkweed and creating a coefficient of relatedness.

Our next steps are to continue cleaning up our data and also to fill the few remaining holes in it to create a final analysis.