## Blog 3

As the summer session progressed, I am going to complete the first stage of the research — data collection. This stage is especially important not only because it is the start of the project but also because the accuracy of the data will directly affect the accuracy of the result, no matter how hard we try to investigate and improve the methods. As for starters, we plan to create a template for the research first. Based on the other papers I have read, one of the most important variable is land cover, which includes bushes, rain precipitation, snow, etc. It is directly related to the distribution of population and the habitability of the region. The data of land cover is downloaded from xxx. Before conducting research myself, “data-collection” process is kind of fixed and simple in my mind, but there are many factors to be considered when I am actually searching for data of the best quality. Since we plan to make a template first, so we are going to use only a few variables for now. After downloading land cover data, based on the papers I have read, two of the other important variables are nighttime light and settlement. Since in this research, one of the main idea is that we are going to investigation how to generate a synthetic population based on various data layers available while using household as a unit instead of using individual as a unit like most of the other people do. Household is a crucial point while thinking about population distribution and even disease spread. In this case, settlement is an important variable. After collecting all these variables, we have to make sure that the data collecting time, region, and organization is corresponding to each other. Then another important step is to organise the data before aggregate all the variables together. We have to get the general population information of Liberia. And based on that data, we have to aggregate individual or household data into clan data based on clan ID. In other words, we are going to get variable data for each clan. From there, we can aggregate all  the variable data based on clan ID using “merge” function. After working with the variable, this week just passed by.

## Blog 2

After reading some papers during the first week, I got the broad idea of the research and have some thought about it. I met with the professor for a few times. We have to know each other’s ideas and then make some more detailed plan for our future work. We all agreed that Random Forest package in R studio is very powerful and might be a good start for the research, and also narrowed down all the methods to Random Forest and Hierarchical Bayesian Model for start.

## Week VI update- Aphantasia survey

This week consisted of two major mile stones for out lab. The first was a trial run of the biology pictionary study. The second involved our pivoting toward focusing on creating an Aphantasia survey.