Exploring Correlation Data on Popular Websites


Professor Nathan Kaplan uses popular websites and online data to generate discussion about the topic of correlation in his math gen ed, "Fat Chance." According to Kaplan, the purpose of the activity is to show students that correlations are everywhere, even in movie rentals, library cards, etc., and part of the era of big data is that companies will know lots of things about people even if they don't tell them. 

Before class, students are supposed to be familiar with concepts from lecture, specifically the idea of correlation between two events. This is a discussion-based activity. Using a projector, Professor Kaplan brings up several websites, explains what they show, and asks students to try to come up with explanations for certain patterns in the data and then to supply questions of their own. 

Kaplan first shows a NY Times magazine article about how Target uses your purchasing information along with some demographic info that they know about you and puts it into a statistical model that predicts whether you are pregnant and what your due date is. The class discusses how what you buy gives clues about who you are.

He then shows some detailed United States census maps together with maps showing which areas rented certain films on Netflix more and less frequently. He points out that it is easy to determine which areas have higher percentages of certain minority groups based on the frequency of certain films being rented. He also points out how other demographic information is tied to rental data. He asks students to explain a few things (the relative rental frequencies of Paul Blart: Mall Cop versus Milk, for example) in terms of the US census data.

Also, Kaplan shows the students some of the statistical discussion on the blog for the dating website OK Cupid. The idea is that even if you do not tell anyone things like your race or your age and you do not show a photo, the things you write in your profile allow people to guess this information with high probability.

After the presentation, students ask their own questions about the websites shown and suggest other places in which similar ideas are relevant (for example, a study suggesting that Facebook can guess your sexual orientation with high probability even if you don't tell them what it is based on the patterns in the sexual orientations of your friends).

Kaplan says to be careful about some of the information that you show. When talking about race/income/other denomgraphics things can get a little sensitive. The correlations that come out sometimes reinforce certain negative stereotypes, so it is very important to try to be objective and just to present the information. Students really like this activity (and it is also a little unsettling, which is a good thing).