The Long and Winding Road (To A Complete Dataset)

Paul McCartney was probably unaware of the treacherous journey I would undertake this summer whilst writing “The Long and Winding Road,” but I deeply relate. Unfortunately, the door at which I keep arriving is neither artistic excellence nor love, but rather academic challenge. My methodology and desired outcomes have substantially changed over the past few weeks. In the three weeks of full-time work I have left on campus, I have high hopes for the amount of data I will procure and statistically analyze.

Looking back at the methodology I began the summer with, it is clear that they were not based in an understanding of computer programming. I was eager to understand the larger patterns of users on Twitter but had no idea how to gather Twitter API. My first two weeks of research were spent becoming familiar with Twitter API and struggling to access Twitter data through Python. Having had no previous experience using Python it has been difficult to create code and user streams. I am working from the top down – I have a goal in mind and need to create code to achieve that goal, but with no background in Python it requires exhausting periods of scouring the internet for helpful hints for writing the code I need. Still, as my knowledge of the basic functions of Python is limited, sometimes all of my code will not process due to a misplaced quotation mark. Nevertheless, coming out of the first week my biggest success was managing to write code that allowed me to tweet from Python through my personal Twitter account.

Having become comfortable with Twitter API and (relatively) familiar with Python, I rewrote portions of my lit review during the third and fourth weeks. Up to this point I had not had a thorough idea of what I would actually be able to do with Twitter. I had large goals and vague ideas but did not have a knowledge of how Twitter data functions. I now no longer intend to use LIWC (Linguistic Inquiry and Word Count) technology. Besides the fact that learning how to use LIWC would not be worth the time cost at this point, the direction of this research has steered away from semantic analysis. I now seek a dataset which will give a macro-level illustration of the patterns of interactions between the political elite (Senators) and general Twitter users. I hope to have a variety of variables represented in this dataset such as geographic location, nature of the responses to a Senator, and frequency of profane language. In addition to writing, in the third week of research I centered on learning the programming basics of summoning tweets from specific users… I’ve become very familiar with Cory Booker’s account activity. This period was a very frustrating point in the research process because I had to reevaluate what I wanted and what it was actually realistic for me to do.

The past two weeks have been centered on writing the code to bring this dataset to fruition. I will be creating Twitter “streams” which gather tweets of a specific nature (replying to a certain senator, replying as a question to a specific senator, replying to a specific senator from said senator’s geographical jurisdiction limits) during a specific time period. This is still my current challenge, and I have been having mixed rates of success. For instance, I have snippets of code which will allow me to elicit some of the tweets that I want, but I am still struggling to format the data I gather so it is aesthetically pleasing. Right now most of my data is difficult to read. I expect that this will be the center of my efforts for the next three weeks.

Moving forward, I am hoping to stumble across a fount of patience. It can be very discouraging to spend hours and hours plugging away at a program that I am still struggling to understand. Should this fount of patience not present itself, I fear for the life of my computer.

I will stop my full-time on-campus work on August 5th. This will be the 10th week of my research. I intend to leave campus with a complete dataset of senators, their tweets, their rate of responses, the geographic location of those responses, and the rate of profanity used. I will then statistically analyze this while back at home in the three weeks until school begins. I am still immensely grateful to the Charles Center for their generosity and for this experience. It has been a difficult summer academically. However, the amount I have learned not just about the politics of Twitter but about research in general is truly unquantifiable.


  1. lailadrury says:

    Wow Grace, you are amazing! Teaching yourself Python is no small feat. But alas, all in the name of sound, reliable research. Your project sounds incredibly interesting- social media is truly the communication platform of the future. I can only foresee Twitter’s influence on political opinion becoming more and more profound as the years go on. It would be interesting to analyze the differential in profanity usage that may or may not be occurring in concordance with the recent events in Charlottesville. I am looking forward to hearing all about your conclusions from this summer, and sincerely hope that your laptop survived any coding rage fits!