Over the Hill

July 24, 2014

At, last we’ve finished!

We’re about two weeks past our intended deadline, but we have finally finished geocoding and exported the raw data set from Toolkit. Ultimately we had documentation for 34 projects for a total of 855 project locations geocoded. Clay is currently going through the process of quality assurance to make sure that the geographic locations that we assigned to the project activities are updated and to make sure that the dataset does not suffer from any repetition. As soon as he finishes, we’ll be able to start making maps in ArcGIS and conduct statistical analyses for the final report. Certainly geocoding was the most arduous of this process – and it was complicated by the other tasks that we had to field for ObservaCoop.

Just some quick facts about this data set.

  • The average number of locations in each Commission project was 25 with a median of 4. This figure includes one outlier project with 394 distinct project sites.
  • Dropping the outlier, the average was 13 locations per project with a median of 4.
  • In fact, the majority of projects had less than 30 locations; about 53% of the projects only had between 1 and 5 locations to geocoded.

These statistics imply that there was some luck involved with geocoding. While usually you could quickly geocode a project, if you were unfortunate enough to get a massive project, such as the one with 394 locations, you could be stuck geocoding the same project for a week while your colleagues whizzed by with easy three-location projects. Indeed, geocoding was a constant source of frustration for all involved, and we’re all glad to be done with this tedious part of the research process. If anything, I’ve learned a lot about project management, especially with regards to geocoding and the geocoding process. Amongst other things, I would have taken more time in the planning stages to extract all the location information from the documentation provided by the European Union – although under normal circumstances the geocoders would have found this information by themselves, our press for time would have made it necessary for a more experience coder to find the information and simply have the interns code the projects. Indeed, a period of “systematization” was needed and would have improved the efficiency of the coding process.

Only a few weeks remain in the Fellowship and it seems more and more uncertain that we will be unable to finish the entire project for ObservaCoop before we all have to return to the United States. It’ll be interesting to see how much we can get through…although the next few parts are much more interesting, especially the maps, they require much more thought and even a flair for the artistic. At the very least ObservaCoop will have the data available.


  1. Rebecca Schectman says:

    I feel your geocoding pain, especially when assigned one of the projects with an exorbitant number of locations. Systematizing coding is a really interesting idea. In the future, geocoding may even be automatic as scripts are developed to extract location information from project documents. I’m sure this would have sped up your workflow this summer. Sounds like an interesting project though!