Speaking R (for the Math Averse)

After having collected all of my survey data and downloading it as an Excel file from Qualtric’s survey platform, I imported it into R. What exactly is R, you might be wondering? R is a statistical programming language. If you are familiar with statistical computing, it may help to think of it as something between SPSS and STATA — true statistical software– and Python, a programming language. Before beginning my project, I had no training or experience in programming, and just a basic understanding of statistics. I am also not particularly good at or fond of math. It was not a surprise to me, then, that data analysis was by far the most difficult stage of my project, in part because I had to learn how to work in R as I went.

In your social science research, you may come across a data analysis puzzle that you may want to analyze in R. Perhaps you have hundreds of millions of data points, for example, or particularly complex social network data. Or you want to create compelling data visualizations from scratch. R might be the program for you. But you have never taken a math course beyond high school statistics, or Calculus I! How are you to analyze your data with a programming language built on a foundation of linear algebra code?

After seven weeks of exploring R, I have a short and simple suggestion: Remember that R is, after all, a programming language. Just like any other language you may have studied — be it English, French, or Chinese — it has a grammar and syntax to it. One of the most basic structures of grammar is a part of speech. R’s parts of speech just happen to be linear algebra data structures. Where English has nouns, verbs, and adjectives, R has vectors, matrices, and data frames. Roughly 90% of the errors that R spit back to me were resolved once I understood the grammatical structure of my data — what sorts of parts of speech was I working with? Once I understood whether I was working with a matrix or a vector, for example, it was much easier to figure out what syntax to use to get R to understand me, and to do what I asked it to do.

I am nowhere near fluent in R — my commands R still inelegant and clunky. But thinking of R as a language, rather than a statistical program, definitely helps me express myself more clearly.