blog 5

After generating random forest model, which is our first milestone. we are going to use it as a tool to predict the population distribution. First off, we have to replace all the data which has the value of zero to speed up the later process by using “replace_na”, and then drop some layers that has no help for the predicting model. These are all the necessary steps to reorganise the data and make the predicting process much faster and more efficient. We also found that “raster” package is a very powerful package for us to aggregate and tidy up the data. There are many useful and common functions either from it or evolve from it. Since the data set is pretty big and the model will take some time to process the data, I learned many skills from the professor and I also talked to some experts in IT office of the college. For this particular process, we used “beginCluster” and “endCluster”, which can speed up the process. Also, in other cases, we might use public computer of our college to create some nodes and files instead of using our own computer. The HPC functions will make the process much faster but it will take some effort to look at the plots and the result of some lines of code since it will have to run the code all together at the one time.

After learning more about the speeding-up skills which will be more useful for our future projects, we are going to use prediction function to create some values and then project them onto the plot. We used the “raster::predict” and “raster::extract” to get to there. Then since I aggregate all the data based on land cover, and my professor aggregate all of them based on the basic data of Liberia, his resolution is more accurate than mine. But the general trend of our data and map are similar. This really is a great step of our research, we mainly used raster and random forest package to make the predictions.