Generate a spatially continuous synthetic population description for improved prediction of human movement and spread of disease

R language came into my life on the first day of college last semester. In my COLL 100 Data Science Lab, I used R Studio to sort through data, make predictions about stocks of Fortune 500 companies(stocks_map), create wordclouds from twitter data(wordcloud_modern_family), project crop data onto the map of Nigeria(dominant_crop_map_lga), etc. I was amazed at how powerful R is while dealing with big data and getting conclusions out of it. As a teaching assistant in the lab this semester, I am taking Human Development seminar and diving more deeply into data science. I learned about freedom and complexity, scale of the world, and cutting-edge methodologies involved in data science. The most engaging part is putting what is discussed in the research paper into reality with R. We made three-dimensional plots of WorldPop data of Liberia and other countries. While writing annotated bibliography, literature review, and central research question through the semester, I get more and more interested in the research and feel motivated to actually deal with the gap found from the previous papers in my own research. At the meanwhile, I am also taking classes from Computer Science and Mathematics department. With all of these inspirations from classes and encouragement from Professor Tyler Frazier, I came up with the idea for the research assignment in class — “Generate a spatially continuous synthetic population description for improved prediction of human movement and spread of disease” and plan to actually do this during the summer.

In the age of big data, generating an agent-based and close-to-reality population description and making valuable implications from it are becoming more and more crucial. The purpose of this research is to generate a synthetic and close-to-reality population to simulate both the movement of population throughout Ghana and the effect that the movement has on the spread of infectious disease. Ghana is selected to be an example case to generate synthetic population and then the method would be generalized to West Africa. We are going to predict household size of all dwellings units across West Africa and predict the remaining demographic characteristics of all household members afterwards. There will be some up-to-date datasets like remotely sensed satellite data, mobile phone data records, as well as nationwide and regional health, demographic, and economic surveys (DHS). In addition to the major methods involved — spatially continuous multinomial logistic regression model (instead of the discretised one before) and conditional probability, we are also going to use Random Forest models, gravity models, hierarchical Bayesian models, and impedance models. All of these great foundations may help us generate some close-to-reality population descriptions and apply them to improve human development issues. It could be used in epidemiology, politics, biology, and many other fields. Plus, it could be used to describe population of any species like stars in star cluster, birds’ migration pattern, road accidents and so forth. The accurateness and broad applicability of this agent-based population description makes it even more promising.