Data Clean Up

When I took government research methods in the Fall 2013 semester, Professor Chris Howard warned us that while research was rewarding, it could also be very tedious.  Part of the “tedious” element of my project has been data cleaning. Each state had three separate excel sheets detailing the meta data for the project. This included the number of articles published in August 2009, the number of articles with healthcare related key terms, and the names of the newspapers that published these articles.  However, the computer program I am using for my project, R, is very finicky and does not work correctly if there are too many spaces or extra commas in documents read into the program.

In order to prepare the excel files, I had to go through each of them individually and adjust columns, change titles, and delete extraneous information so that every excel sheet had identical formatting.  This took about a week and half to complete.  However, with clean meta data, I can now aggregate this information together in R. There will be clean records of every article used in our data set, along with accurate information about total articles and total articles with key words in order to generate frequency percentages.

It’s been a slow slog, but every small step is a little victory and crucial to the overall success of my project.


  1. wjevans01 says:

    Hey Joanna, I can definitely relate to what Prof. Howard said about research. I’m hoping that the rewarding part of research will start to show up sooner rather than later. I think this summer has given me a new appreciation for all of the work that goes into a finished research product.