Obesity Epidemic?

Today on Slate there is a nice little GIF (that originally appeared on The Atlantic) that shows how obesity rates have changed over time by state. Slate seems to suggest that the geographic progression of obesity rates might indicate some sort of social contagion. But, ss many others (and here) before me have pointed out, we have to be very careful when trying to draw inferences about social contagion. If we take a look at a map of household income by state, we see that there is a lot of overlap between the poorest states and those with the highest obesity rates.

Household Income

There are lot’s of potential causal connections here. For example, income might affect the types of stores and restaurants available, which in turn affects obesity rates. For a more careful look at some data on the social contagion of obesity, have a look at our paper that examines obesity rates, screen time, and social networks in adolescents.

 

As a side note, it’s interesting to compare the map of the “obesity epidemic” to a map of something we know spreads through person to person contagion, like the swine flu (image from the New York Times).

H1N1

Unlike the obesity epidemic, swine flu jumps all over the place, which obviously has to do with air travel.

New paper on social contagion of obesity

Along with a team of researchers led by epidemiologist David Shoham from Loyola University, I recently published a paper in PLoS One examining the social contagion of obesity. As many of you know, this is a hotly debated topic of research that was kicked off by work of James Fowler and Nicholas Christakis published in the New England Journal of Medicine.  (See this post for my two cents on the debate.) The central criticism of this research surrounds the issue of separating friendship selection from influence, which in some sense was laid to rest by Cosma Shalizi and Andrew Thomas.

One alternative approach is to use a “generative model,” which is exactly what my coauthors and I do. Specifically, we use the SIENA program developed by Tom Snijders and colleagues. Essentially, this model assumes that people make choices about their friendships and behavior just like economists and marketers assume people make choices about where to live or what car to buy.

In our paper, we apply the model to data from two high schools from the AddHeath study. We use the model to understand social influences on body size, physical activity, and ”screen time” (time spent watching TV, playing video games, or on the computer). In short, here’s what we find:

  • In both schools students are more likely to select friends that have a similar BMI (body mass index), that is there is homophily on BMI.
  • In both schools there is evidence that students are influenced by their friends’ BMI.
  • There is no evidence for homophily on screen time in either school, and there is evidence that students are subject to influence from their friends’  on screen time in only one of the two schools.
  • In one of the two schools there was evidence for homophily on playing sports, but in both schools there was evidence that students influenced their friends when it comes to playing sports.

Visualizing Contagious Twitter Memes with NodeXL and Gephi

In the last post we explored how to use NodeXL to collect a Twitter user’s network data. Now, I’ll describe how to collect data on a trending topic.

To get started, follow steps 0 and 1 here to setup a Twitter account and download the NodeXL software. Then, to download the network data, click on Import and select From Twitter Search Network… In the first dialog box, enter the search term that you want to look for. Any account that recently posted a tweet containing this phrase will end up being a node in your network.  In the book, “Analyzing Social Media Networks with NodeXL,” there is some good advice on choosing an appropriate trending topic to look at:

“First, the search phrase has to concern a recent event. Though Twitter has been around for several years, the volume of information being produced every second is so huge that the search interface has limits on how many tweets it will return for a given query, or how old tweets can be. Searching for “2008 Election” may in theory produce a valuable set of tweets about the election cycle, but in practice those tweets are too far back in time for the search interface to collect them efficiently. The second criterion is that the search phrase has to relate to a piece of news, promotion, event, and so on that is u contagious” (i.e., Twitter users who see the message will, at least in principle, want to pass it on to their followers). A search phrase like “Thanksgiving” is a trending topic on Twitter (shortly before and on Thanksgiving) but lacks a contagious property-there is no need to pass on the message because a large fraction of the population already knows about it, so tweets about Thanksgiving are independent events rather than the sign of a “Thanksgiving meme” spreading throughout the Twitter population.”

One good way to do this is look through the recent tweets of a popular user for something that you think would be sufficiently interesting that other people would retweet the message. For example, in the network below, I gathered data on tweets containing the phrase “Who Googled You?” This Twitter meme originated with Pete Cashmore, of @mashable, and links to a Mashable article that describes a way to find out who has been searching for you on Google. The article generated a flurry of interest and many other people tweeted links to the article, generally repeating the original article title, “Who Googled You?” Since this meme spread from person to person, it was a good candidate for visualizing as a Twitter search network. Untitled

You can select what relationships you want to use to define the edges of your network by selecting any combination of the following choices:

Follows relationship — two accounts are connected if one account follows the other.
“Replies-to” relationship in tweet — two accounts are connected if one account replies to the other in its tweet.
“Mentions” relationship in tweet — two accounts are connected if one account mentions the other account in its tweet.

As discussed in the previous post, because of Twitter rate limits, it is advisable to limit your request to a fixed number of people. Unless you are especially patient, I recommend starting with just 300 people.

Once you download the data using NodeXL, I like to export it as a graphml file and then visualize it in Gephi. In this example, I did a few things to make the visualization more meaningful, which I describe below.

Before getting started with manipulating the network in Gephi, it is a good idea to go into the Data Laboratory and delete some of the columns that NodeXL created. You should delete anything having to do with the color or size of the nodes or edges, or centrality measures such as PageRank and eigenvector centrality. These columns are generally empty, but unless you delete them, Gephi won’t overwrite them when you ask it to calculate these measures, so you won’t be able to calculate and make use of them in your analysis. For some general tips on using Gephi, check out the FAQ here.

First, I filtered out all of the accounts except those that belong to the largest connected component of the network. This makes the network much more readable, and allows us to focus only on those nodes involved in a large cascade. After trying a few options, I choose the Force Atlas layout algorithm to arrange the nodes. For Twitter networks, I have found Force Atlas to generally give the best layout. Usually, I have to increase the repulsion strength from the default setting of 200 to 2000 or more. Then I resized the nodes according to their degree so we can get a sense for who the most important nodes in the network are. I also tried sizing the nodes by PageRank and eigenvector centrality for comparison. For the most part these different centrality measures didn’t make much difference, although one account, @darrenmcd, appears significantly more important according to PageRank or Eigenvector centrality than degree centrality. The Twitter accounts @briansois and @armano standout as the most influential in the network. I colored the nodes according to which community they belong to as identified using Gephi’s implementation of the Girvan-Newman modularity based clustering algorithm, and I colored the edges according to the type of relationship between the Twitter accounts. Blue edges are “followed” relationships, green edges are “mentions” and purple edges are “replies to.” We can see that almost all of the links to @armano mention the relationship explicitly, and about half of those to @briansois do.

WhoGoogledYou

Of Monsters and Men — How an Icelandic Band Exploded using the Web

Bo Olafsson, a Kellogg student that took Social Dynamics and Networks with me this past fall quarter, put together a nice slideshow explaining how a little known Icelandic band, Of Monsters and Men, became a huge US success without ever visiting the country. Check it out:

Interestingly, Bo’s slides have become a mini viral phenomenon themselves garnering press attention in both Iceland and the US:

What it takes to “Go Viral”

It seems like we hear a new story every week: a video, or a rumor, or a song, or a commercial has “gone viral,” spreading across the web like wildfire, racing to the top of the most tweeted list, and grabbing headlines in real old fashioned news media. These memes can be disgusting (like the Domino’s pizza video), controversial (like the recent Kony 2012 video), and entertaining (“Friday” ?). They can be disasters for companies (see Domino’s above), or marketing campaigns that reach hundreds of thousand, or even millions, of viewers for relatively little investment (1300 foot drop, the Old Spice Guy). Given the potential impact of these “memes,” there is a lot of interest in what exactly determines whether or not a video, or a message, or a rumor goes viral. Here’s a simple model that explains why some things do and some things don’t.

Let’s consider the example of a YouTube video. Suppose that on average, every person that views the video tells of their friends about it per day (stands for contacts), and suppose that some fraction of the people that hear about the video actually watch it and start telling other people about it themselves (i stands for infectivity, and captures something like how interesting the video is.) Finally, suppose that on average, each person that is actively spreading word of the video does so for d days before they get bored and stop telling people about the video (d stands for duration).

To keep things simple, suppose that there are a total of N people in the population, and every one of these people is either actively spreading the video, or not actively spreading the video, but susceptible to becoming a video spreader. Let I denote the number of people currently spreading (i.e. infected) and S the number of people that are susceptible, but not currently spreading the video. So, I+S=N.

To see if the video goes viral or not, we just have to compare the rate at which people are becoming infected to the rate at which people are discontinuing sharing the video. It helps to think of a bath tub — the level of water in the bath tub represents the number of people spreading the video. The rate that water flows in through the faucet is the rate at which new people are becoming infected with the video spreading virus; the rate at which water drains out is the rate at which people are stopping spreading the video. If the rate at which water flows in is higher than the rate at which it drains out, the tub will keep filling up. On the other hand, if the drain is more open than the faucet, the bath tub will never fill up.

So, we have to figure out the rate at which new people are starting to spread the video and the rate at which people currently sharing the video are stopping. The second one is easier. If I people are currently sharing the video and each one of them shares it for d days on average, then each day we expect I/D people to stop spreading the video. For the first rate, we have I people actively sharing the video. On average, each one of them shares the video with c contacts per day, resulting in a total of cI contacts for the whole population. But, not all of these contacts results in a new person sharing the video. First, some of these people will already be sharing the video. The probability that a given person is not currently sharing the video is S/N, the fraction of “susceptible” people in the population. So, we expect cIS/N instances in which a person shares the video with someone that is currently spreading the video. Given such a contact, we said that a fraction i of these will result in a new person sharing the video. Putting it all together, the rate at which new people are becoming infected with the video sharing virus is ciIS/N.

Now we have to compare our two rates. The video will go viral if ciIS/N>I/d. Dividing both sides by I and multiplying both sides by d, this becomes, cidS/N>1. Finally, we can make life a little simpler by assuming that initially almost no one knows about the video, so the number of susceptible people S and the total population N are about the same. Then S/N is approximately 1, so the equation simplifies to just cid>1.

This simple equation tells us whether or not the video will go viral. It says if the average number of contacts, times the infectivity, times the duration is greater than one, the video will spread, otherwise it will die out. Right at cid=1 there is a tipping point; crossing this threshold causes a discontinuous jump in the future.

This model makes a lot of assumptions that don’t really hold (big ones are that people have roughly the same # of contacts on average, and the people basically interact at random), but it gives us a basic understanding of the process. Even in more complicated models, where we make fewer simplifying assumptions, there is typically a similar tipping point, and increasing either contacts, infectivity, or duration increases the chance of crossing that threshold.

So, there you have it — everything you need to go viral: a network with enough contacts (c); a product, or message, that sounds interesting enough to be infectious (i), and with enough staying power so that people keep telling their friends about it for a long time (d).

Social Dynamics Videos

While I’ve been teaching Social Dynamics and Networks at Kellogg, I’ve amassed a collection of links to interesting videos on social dynamics. Here they are:

Duncan Watts TEDx talk on “The Myth of Common Sense”

Nicholas Christakis TED talk on “The hidden influence of social networks”; TED talk on “How social networks predict epidemics.”

James Fowler talking about social influence on the Colbert Report.

Sinan Aral TEDx talk on “Social contagion”; at PopTech 2010 on “Social contagion”; at Nextwork on “Social contagion”; at the International Conference on Weblogs and Social Media on “Content and causality in social networks.”

Scott E. Page on “Leveraging Diversity”, and at TEDxUofM on “Putting Milk Crates on the Internet.”

Eli Pariser TED talk on “Beware online ‘filter bubbles’”

Freakonomics podcast on “The Folly of Prediction”

Damon Centola on “Network Contagion.”

Jure Leskovec on “The Web as a Laboratory for Studying Humanity”

There are several good videos of talks from the Web Science Meets Network Science conference at Northwestern: Duncan Watts, Albert-Laszlo Barabasi, Jure Leskovec, and Sinan Aral.

The “Did You Know?” series of videos has some incredible information about, well, information. More info here.

Why Google Ripples will be a lot less cool than it sounds.

Google + now has a new feature, Ripples, that allows you to see a network visualization of the diffusion of a post (see the Gizmodo article here).  The pictures are cool, but the original post has to be public, and then it has to be shared by one Google+ user to other Google+ users.  But, the chance of interesting ripples happening very often are pretty slim; here’s why.

Bakshy, Hofman, Mason, and Watts looked at exactly this kind of cascade on Twitter, which is a great platform for this kind of research for several reasons.  First, everything is effectively public, so there are none of the privacy issues of Facebook, and we don’t have to limit ourselves to looking at just the messages that people choose to make public like we do on Google +.  Second, “retweeting” messages is an established part of Twitter culture, so we expect to find cascades. Finally, since tweets are limited to 140 characters, links are often shortened using services like bit.ly.  This means that if I create a link to a New York Times article and you create a link to the same page independently, those links will be different, so the researchers can tell the difference between a cascade that my post creates and one that yours creates.

Some of the cascades that Bakshy et al. found are shown in this figure.

They looked at 74 million chains like these initiated by more than 1.6 million Twitter users during two months in 2009.  A lot of interesting things came out of the study, but the most important one for Google Ripples is that 98 percent of the URLs were never reposted.  That’s not good for Ripples.  The latest number puts the entire Google plus user population at only 43.6 million users, and since only a small fraction of these users’ posts will be public posts, even if people share other people’s posts on Google+ as frequently as the retweet links on Twitter (which is unlikely), we still can’t expect to see many Ripples that look like anything but a lonely circle.

Twitter Terrorists: False information + positive feedbacks = real panic

Another example of how false information, amplified through positive feedbacks, can lead to real panic: in Veracruz Mexico two people posted messages on twitter reporting kidnappings at a local school. The messages spread rapidly through social media leading frightened parents to rush to try and save their children. The panic caused dozens of car accidents and jammed the city’s emergency phone lines.

Amnesty International was quoted saying, “The lack of safety creates an atmosphere of mistrust in which rumours that circulate on social networks are part of people’s efforts to protect themselves, since there is very little trustworthy information.” As with many “tipping point” phenomenon, before the spark that set off the visible cascade, there was most likely a “contextual tipping point” that made the resulting contagion possible. Governments or managers have to realize that the only way to reliably prevent these cascades is by changing the context, not by stamping out all of the sparks.

The S&P credit downgrade, turmoil in the markets, and the 1973 toilet paper shortage

On Friday, August 5, Standard & Poor’s downgraded the credit rating of the U.S. long-term debt to AA+.  On Monday, the first day the markets opened since the downgrade, the Dow Jones Industrial average dropped 5.6 percent and the S&P 500 fell 6.7 percent — the biggest single day drops since the crisis in 2008.  A lot of people might be confused about this turmoil in the markets, since US debt is still considered one of the safest investments there is.  Jay Forrester, founder of the field of System Dynamics, calls puzzles like this the “counterintuitive behavior of social systems.”

Undoubtedly, the world economy is incredibly complex, and no individual or organization has a complete picture of how it works or where it’s headed.  Through pricing, the market is supposed to aggregate all of the pieces of partial information that we each hold and then converge to the “truth” — that is prices should reflect true underlying value.  In some situations this can actually work.  Prediction markets have been shown to be valuable tools for businesses to harvest the “wisdom of the crowds” and assess the probabilities that future events occur.  But, this mechanism works best when individuals place their trades independently based on their own private information. In the real world, market dynamics are fundamentally social dynamics and as such they are subject to cascades of panic and the accumulation of overconfidence (what Alan Greenspan famously referred to as “irrational exuberance” (see also Robert Shiller)).

 

The current panic illustrates how even when there is no fundamental basis for a panic, social dynamics can amplify the signal of a panic to the point where an actual crisis ensues.  The gas shortages of 1979 are a classic example of this phenomenon.  The Iranian revolution sharply cut oil imports to the US from Iran.  Nervous consumers rushed to top off their tanks and even to hoard gasoline at home.  This drained the supply of gasoline at filling stations leading to an actual gasoline shortage.  Word-of-mouth and media coverage reinforced consumer fears of shortages, leading to even more topping off and hoarding, as well as government policies such as odd/even day purchase rules that actually further incentivized consumers to top off frequently and store gasoline at home.  Surprisingly, despite the very real shortage of gasoline at filling stations, US oil imports for the year actually increased in 1979 compared 1978.  The crisis was caused by social dynamics, not an actual drop in supply. (See Sterman, Business Dynamics p. 212).

A similar but more comical crisis occurred in 1973 when Johnny Carson made a joke saying, “You know what’s disappearing from the supermarket shelves?  Toilet paper.  There’s an acute shortage of toilet paper in the United States.”  Consumers rushed out to stock up on toilet paper, leading to a real toilet paper shortage in the US that lasted several days.  Even though Carson tried to correct the joke a few days later, by that time toilet paper was in fact in short supply because people were hoarding it at home.

Another Social Media Disaster: The Milk “Everything I Do is Wrong” campaign

The New York Times chronicles another example of an online ad campaign gone bad.  This time the California Milk Processor Board ran a campaign at everythingidoiswrong.org (since replaced with gotdiscussion.org) touting the abilities of milk to help reduce PMS symptoms that clearly made light of women from a stereotyped males perspective.  As we well know by now, in the age of social media, a misstep like this can quickly turn into a disaster.