Blog 4

The time flies so fast, though we are a little bit behind schedule for now, we already have a clear plan ahead. For now, we are going to create a template and analyse the data we have already downloaded first. I have to keep reading paper about random forest to see how it works underneath the model itself. I have read more about strapping method, tree creating process, how many trees to create, and such and such. Before setting up the random forest model, we are going to see whether all the sub-variables of the variable — land cover, nighttime light, and settlement — are important to the population distribution that we care about. We use RMSE to make an importance plot of the variables and delete the sub variable data which has the importance zero. For landcover, the “snow” variable has been deleted from the data set since it doesn’t have any affect on the result and might have a bad affect on the time while running the model. Then after sorting through all the data we downloaded, we are going to put the data into the “random forest” model to create a Large randomForest formula. From there, I also create a plot called variable importance plot to see which variable is the most crucial for our project. The result is that “urban” is the most important one, and “water.permanent” is the least importance one.

We also investigating several other functions related to random forest, since it is a powerful function. TuneRF could be used to organising the model data. Also, one of the most exciting point is that random forest model is cutting edge and there are some other people exploring about ranger function and even geospatial random forest function, which are even more advanced. We tried to use “ranger” function and “ranger::importance” as well. We used the most basic one at first and try to explore and understand more about more updated methods used by experts.