Gun Control and Homophily in Social Networks

Last week the website of The Atlantic had a nice network visualization of the top tweets linking to articles on gun politics. You should go check out their site where the network visualization is interactive, but here is a static picture so you get the idea. Homophily in the network of gun politics tweets

Each node in this network is one of the top 100 most tweeted weblinks on gun politics during the week from Sunday 2/17 to Sunday 2/24. The creator of the network visualization collected all of the tweets that mentioned terms like "gun rights," "gun control," "gun laws," etc. and then looked for the most popular links in those tweets. (One thing I wonder about is how they dealt with shortened URLs. Since tweets are limited to 140 characters or less, when most people post a link on Twitter they shorten the URL using a service like bit.ly. This means that two people that are ultimately linking to the same article might post different URLs. Many news services have a built in "Tweet this" button, which may give the same shortened URL to everyone who clicks it, so those articles would get many consistent links, where articles or posts without a "Tweet this" button might have many links pointing to them, but all with different URLs coming from each time a person shortened the link individually. All of this is just a technical aside though, because I am a 100% sure the main point of the network visualization, which I haven't even gotten to yet, would still show up.)

The edges in the network visualization connect two pages if the same Twitter account posted links to both pages. The point is that we see two very distinct clusters with lots of edges within the clusters and not too many between them. Of course, taking a loser look at the network visualization we see that one of the groups consists of pro gun control articles and the other contains anti gun control pages. The network science term for this phenomenon is homophily i.e. nodes are more likely to connect to other nodes that are similar to them. Homophily shows up in lots and lots of networks. Political network visualizations almost always exhibit extreme homophily. For example, take a look at this network of political blogs created by Lada Adamic and Natalie Glance (they have generously made the data available here).

Homophily in political blogsIn this network the nodes are blogs about politics and two blogs are connected if there is a hyperlink from one blog to another. Blue blogs are liberal blogs and red blogs are conservative.

Or, take a look at this network of senators created by a group in the Human-computer Interaction Lab at the University of Maryland.

Homophily in the Senate

Here, the nodes are senators and two senators are connected if they voted the same way on a threshold number of roll call votes.

Homophily shows up in other types of social networks as well, not only political networks. For example, take a look at this network of high school friendships from James Moody's paper "Race, school integration, and friendship segregation in America," American Journal of Sociology 107, 679-716 (2001).

Homophily in high school friendships

Here the nodes are students in a high school and two nodes are connected if one student named the other student as friend (the data was collected as part of the Add Health study). The color of the nodes corresponds to the race of the students. As we can see, "yellow" students are much more likely to be friends with other yellow students and "green" students are more likely to connect to other green students. (Interestingly, the "pink" students, who are in the vast minority seem to be distributed throughout the network. I once heard Matt Jackson say that this is the norm in many high schools — if there are two large groups and one small one, the members of the small group end up identifying with one or the other of the two large groups.)

Homophily is actually a more subtle concept than it appears at first. The thorny issue, as is often the case, is causality. Why do similar nodes tend to be connected to one another? The problem is so deeply ingrained in the concept of homophily that it sometimes leads to ambiguity in the use of the term itself. Some people use the word homophily to refer to the observation that nodes in a network are more likely to connect to similar nodes in the network than we would expect due to chance. In this case, there is no mention of the underlying reason why similar nodes are connected to one another, just that they are.  When other people use the term homophily, they mean the tendency for nodes in a network to select similar nodes in the network to form connections with. To keep the distinction clear, some people even refer to the former definition as observed homophily. 

To understand the difference it helps to think about other reasons why we might see similar nodes preferentially connected to one another. The casual stories fall into three basic categories: influence, network dynamics, and exogenous covariates. For many people, the influence story is the most interesting. In this explanation, we imagine that the network of connections already exists, and then nodes that are connected to one another affect each other's characteristics so that network neighbors end up being similar to one another. For example, in a series of papers looking at a network of friends, relatives, and geographic neighbors from the Framingham Heart study, Christakis and Fowler argue that network neighbors influence one another's weight, tendency to smoke, likelihood to divorce, and depression. While not everyone is convinced by Christakis and Fowler's evidence for a contagion effect, we can all agree that in their data obese people are more likely to be connected to other obese people, smokers tend to be friends with smokers, people that divorce are more likely to be connected to others that divorce, and depressed folks are more likely to be connected to other depressed people than we would expect due to chance.

In the network dynamics story, nodes form or break ties in a way that shows a preference for a particular attribute. Our intuition is that liberal blogs like to link to other liberal blogs more than they like to link to conservative blogs. This is what some people take as the definition of homophily. Since the word literally means "love of the same" this makes some sense.

But, just because we see observed homophily doesn't mean people are preferentially linking to other people that are like them. This is reassuring when we see homophily on dimensions like race as in the high school friendship network above. Clearly, the students are not influencing the race of their friends, but this doesn't mean the fact that we observe racial homophily doesn't imply the students are racist — there could be what we call an exogenous covariate that is leading to the observation of homophily. For example, it could be that these students leave in a racially segregated city and students are more likely to be friends with other students that live close to them. In this case, students prefer to be friends with other students that live near them, and living near one another just happens to increase the likelihood that the students share the same race.   One particularly tricky covariate is having a friend in common. Another common observation in social networks is what is called triadic closure. In lay terms, triadic closure means that two people with a friend in common are likely to be friends with each other — the triangle closes instead of remaining an open like a V. It could be that, in the high school friendship network, there is a sight tendency for some students to choose others of their same race as friends; either because of another variable like location or because of an actual racial bias, but the appearance of racial homophily could be significantly amplified by triadic closure. If one student chooses two friends that are of the same race, triadic closure is likely to result in third same race tie. It turns out that, at least in some cases where scholars have been able to untangle these various stories, triadic closure and homophily on other covariates explains a lot of observed racial homophily (see e.g. Wimmer and Lewis or Kossinets and Watts).

So, what about the gun control network? In this case, we can rule out influence, since the articles had to already exist and have a stance on gun control before someone can tweet a link to them. That is, the "state" of the node as pro or anti gun control precedes the formation of a tie connecting them in the Atlantic's network. But as far as the other explanations go, it's probably a mix. An obvious exogenous covariate is source. If I read news on the website of MSNBC and you go to the Fox website, I'm more likely to tweet links to pro gun control articles and your more likely to to tweet anti gun control links, even if we are both just tweeting links to every gun control article we read. Undoubtedly though, many people are using Twitter as a way to spread information that supports their own political opinions, so someone that is pro gun control will tweet pro gun control links and vice versa. This however doesn't mean that gun control advocates aren't reading 2nd amendment arguments and gun rights supporters aren't reading what the gun control folks have to say — it just means that they aren't broadcasting it to the rest of the world when they do.

New paper on social contagion of obesity

Along with a team of researchers led by epidemiologist David Shoham from Loyola University, I recently published a paper in PLoS One examining the social contagion of obesity. As many of you know, this is a hotly debated topic of research that was kicked off by work of James Fowler and Nicholas Christakis published in the New England Journal of Medicine.  (See this post for my two cents on the debate.) The central criticism of this research surrounds the issue of separating friendship selection from influence, which in some sense was laid to rest by Cosma Shalizi and Andrew Thomas.

One alternative approach is to use a "generative model," which is exactly what my coauthors and I do. Specifically, we use the SIENA program developed by Tom Snijders and colleagues. Essentially, this model assumes that people make choices about their friendships and behavior just like economists and marketers assume people make choices about where to live or what car to buy.

In our paper, we apply the model to data from two high schools from the AddHeath study. We use the model to understand social influences on body size, physical activity, and "screen time" (time spent watching TV, playing video games, or on the computer). In short, here's what we find:

  • In both schools students are more likely to select friends that have a similar BMI (body mass index), that is there is homophily on BMI.
  • In both schools there is evidence that students are influenced by their friends' BMI.
  • There is no evidence for homophily on screen time in either school, and there is evidence that students are subject to influence from their friends'  on screen time in only one of the two schools.
  • In one of the two schools there was evidence for homophily on playing sports, but in both schools there was evidence that students influenced their friends when it comes to playing sports.

The Christakis and Fowler Social Networks Influence Brouhaha

Recently there has been a spirited conversation kicked off by the publication of an article, "The Spread of Evidence-Poor Medicine via Flawed Social-Network Analysis," by Russell Lyons regarding the well-publicized work of Nicholas Christakis and James Fowler on social contagion of obesity, smoking, happiness, and divorce.  The discussion has been primarily confined to the specialized circle of social network scholars, but now that conversation has spilled out into the public arena in the form of an article by Dave Johns in Slate (Dave Johns has written about the Christakis-Fowler work in Slate before).  Christakis and Fowler's work has received a huge amount of attention, appearing on the cover of the New York Times magazine, on the Colbert Report TV program, and a ton of other places (see James's website for more links).  Many others have made detailed comments on Lyon's article and on the original Christakis-Fowler papers.  I wish to address some of the related issues raised in Slate about scientists in the media

The article seems to criticize Christakis and Fowler for their media appearances, as though this publicity is inappropriate for scientists who should be diligently but silently working in the background, leaving it up to policy makers and the media to make public commentary and recommendations.  I think this criticism is not only wrong, but dangerous.  Many if not most researchers do work silently in the background, shunning the spotlight and scrutiny of the media, not out of shyness or fear of embarrassment, but because of a pervasive misunderstanding of scientific uncertainty.  Hard science is simply much softer than many people realize.

ALL scientific conclusions — from physics to sociology come with uncertainty (this does not apply to mathematics, which is actually not a science).  A "scientific truth" is actually something that we're only pretty sure is true.  But we'll never be definitely 100% sure, that's just how science works.  When one scientist says to another, we have observed that X causes Y, it is understood that what is meant is, the probability that the observed relationship between X and Y is due to chance is very small.  But, statements like that don't make for good news stories.  Not only are they uninteresting, but for most people they're unintelligible (which is not to say that the public is stupid — the concepts of uncertainty and statistical significance are extremely subtle and often misunderstood even by well-trained scientists).  So, many scientists avoid the media because we're asked to make definitive statements where no definitive statements are possible, or we make statements that include uncertainty that are ignored or misunderstood.

But we need scientists in the media.  Only a fraction of Americans believe the planet is warming and 40% of Americans believe in creationism.  Scientists in the media can help correct these misperceptions and guide public policy.  And, maybe even more importantly, scientists in the media can make science sexy.  We already live in a world where science and politics are often at odds, and in which scientists that avoid the media are often overruled by politicians that seek it out.  Scientists are already wary of making public statements that implicitly contain uncertainty for fear of them being interpreted as definitive. Christakis and Fowler have done us a great service by taking the risk of making statements and recommendations in the public arena based on the best of their knowledge, by raising public awareness of the science of networks, and by making science fun, interesting, and relevant.

Homophily and Information Spread

This article in Wired covers new research on networks and information by Sinan Aral (Northwestern B.A. in Political Science, MIT Sloan PhD, now at NYU Stern) and Marshall Van Alstyne.  The article describes research on the email communications of members of an executive recruiting firm, and says, “those who relied on a tight cluster of homophilic contacts received more novel information per unit of time.”  The article is confusing though because it mixes several distinct network concepts: homophily, strong ties, clustering, and “band width.”  Homophily is the tendency for people to be connected to other people that are similar to them; birds of a feather flock together. In his seminal paper, “The Strength of Weak Ties,” Mark Granovetter defined the strength of a tie as “a (probably linear) combination of the amount of time, the emotional intensity, the intimacy (mutual confiding), and the reciprocal services which characterize the tie”.  Clustering measures the tendency of our friends to be friends with each other.  And bandwidth is a less standard term in the social networks literature that captures the total amount of information that flows through a given tie per unit time (and thus is about the same thing as strength of a tie).

After reading the Wired piece, I’m left wondering if it is

  1. strong or “high bandwidth” ties through which we communicate a lot of total information,
  2. homophilic ties with people that are similar to us,
  3. ties with people that are members of a tightly knit cluster of friends, or
  4. all of the above

that provide us with the most novelty in our information diet.

A look at the original research article makes it more clear why the Wired article was so confusing.  The actual argument has a lot of moving pieces to it.  The first argument is that structurally diverse networks tend to have lower bandwidth ties.  Here structurally diverse appears to mean not highly clustered.  So, you talk more to the people in your personal clique than to people outside of your tightly knit group.  The second piece relates structural diversity to information diversity.  They find that the more structurally diverse the network, the more diverse the information that flows through it.  So far, this seems to line up with the standard Granovetter weak ties story.  The third relationship is that increasing bandwidth also increases information diversity, and more importantly, increasing bandwidth increases the total volume of new (non-redundant) information that an individual receives.  The idea here is that if you get tons of information from someone, some of it is going to be new.

Finally, since both structural diversity and bandwidth increase information diversity, but structural diversity decreases with increased bandwidth, they set up a head to head battle to see whether the information diversity benefits of increasing bandwidth outweigh the costs of reducing structural diversity.  They have three main findings on this front that characterize when bandwidth is beneficial:

  • “All else equal, we expect that the greater the information overlap among alters, the less valuable structural diversity will be in providing access to novel information.”
  • “All else equal, the broader the topic space, the more valuable channel bandwidth will be in providing access to novel information.”
  • “All else equal, ... the higher the refresh rate, the more valuable channel bandwidth will be in providing access to novel information.”