Clustering and the Ignorance of Crowds

Over on the Cheap Talk blog (@CheapTalkBlog), Jeff Ely (@jeffely) has an interesting post about the "Ignorance of Crowds." The basic idea is that when there are lots of connections among people, each individual has less incentive to seek out costly information — e.g. subscribe to the newspaper — on their own, because instead they can just get that information ("free ride") from others. More connections means more free riding and fewer informed individuals.

I take a much more complicated route to the same conclusion in "Network Games with Local Correlation and Clustering." Besides being sufficiently mathematically intractable to, hopefully, be published, the paper does show a few other things too. In particular, I look at how network clustering affects "public goods provision," which is the fancy term for what Jeff Ely calls subscribing to the newspaper. Lots of real social networks are highly clustered. This means that if I'm friends with Jack and Jill, there is a good chance that Jack and Jill are friends with each other. What I find in the paper is that clustering increases public goods provision. In other words, when people are members of tight knit communities, more people should subscribe to the newspaper (and volunteer, and pick up trash, and ...)

It's pretty clear that the Internet, social media etc... are increasing the number of contacts that we have, but an interesting question that I haven't seen any research on is How are these technologies affecting clustering (if at all)?

"Predicting the Present" at the CIA

The CIA is using tools similar to those we teach in the Kellogg Social Dynamics and Networks course to "predict the present" according to an AP article (see also this NPR On the Media interview).

While accurately predicting the future is often impossible, it can be pretty challenging just to know what's happening right now.  Predicting the present is the idea of using new tools to get a faster, better picture of what's happening in the present.  For example, the US Bureau of Labor and Statistics essentially gathers the pricing information that goes into the Consumer Price Index (CPI) by hand (no joke, read how they do it here). This means that the governments measure of CPI (and thus inflation) is always a month behind, which is not good for making policy in a world where decades old investment banks can collapse in a few days.

To speed the process up, researchers at MIT developed the Billion Prices Project, which as the name implies collects massive quantities of price data from across the Internet to get a more rapid estimate of CPI. The measure works, and is much more responsive than the governments measure. For example, in the wake of the Lehman collapse, the BPP detected deflationary movement almost immediately while it took more than a month for those changes to show up in the governments numbers.

Why Google Ripples will be a lot less cool than it sounds.

Google + now has a new feature, Ripples, that allows you to see a network visualization of the diffusion of a post (see the Gizmodo article here).  The pictures are cool, but the original post has to be public, and then it has to be shared by one Google+ user to other Google+ users.  But, the chance of interesting ripples happening very often are pretty slim; here's why.

Bakshy, Hofman, Mason, and Watts looked at exactly this kind of cascade on Twitter, which is a great platform for this kind of research for several reasons.  First, everything is effectively public, so there are none of the privacy issues of Facebook, and we don't have to limit ourselves to looking at just the messages that people choose to make public like we do on Google +.  Second, "retweeting" messages is an established part of Twitter culture, so we expect to find cascades. Finally, since tweets are limited to 140 characters, links are often shortened using services like  This means that if I create a link to a New York Times article and you create a link to the same page independently, those links will be different, so the researchers can tell the difference between a cascade that my post creates and one that yours creates.

Some of the cascades that Bakshy et al. found are shown in this figure.

They looked at 74 million chains like these initiated by more than 1.6 million Twitter users during two months in 2009.  A lot of interesting things came out of the study, but the most important one for Google Ripples is that 98 percent of the URLs were never reposted.  That's not good for Ripples.  The latest number puts the entire Google plus user population at only 43.6 million users, and since only a small fraction of these users' posts will be public posts, even if people share other people's posts on Google+ as frequently as the retweet links on Twitter (which is unlikely), we still can't expect to see many Ripples that look like anything but a lonely circle.

Detecting Illicit Activity by Examining Communication Network Structure

This article from The Atlantic's website describes some fascinating research by Brandy Aven at CMU's Tepper School that demonstrates how communication networks discussing illicit activity differ from those discussing routine matters by examining the Enron email archives. It's a great example of how the structure of a network can reveal information about the process that generated it.

Exploration versus Exploitation in Google's Think Quarterly

In Google's first issue of "Think Quarterly," it's new business to business publication, Susan Wojcicki, Google's employee number 16, sums up the classic exploration versus exploitation tradeoff writing, "We face the classic innovator’s dilemma: should we invest in brand new products, or should we improve existing ones?"

James March laid out this ubiquitous dilemma, which every organization faces in one form or another, in his now classic paper, "Exploration and Exploitation in Organizational Learning."  Each summer at the University of Michigan's ICPSR Summer Program on Quantitative Methods I co-teach a course on complex systems models in the social sciences in which I often discuss March's famous paper (in fact, we just discussed the paper today).  In going over the paper this summer I was struck again by the continuing relevance of his insights.

The quote that grabbed me today was, "... adaptive processes characteristically improve exploitation more rapidly than exploration ... these tendencies to o increase exploitation and reduce exploration make adaptive processes potentially self-destructive."  Here, March says we have to constantly be on guard to preserve exploration in our organizations.  Our natural tendency, just by doing what's best for us in the short run, is to gradually scale back exploration in favor exploitation, until all we do is exploit.  But, in doing so, we ultimately doom our organization to failure because we're no longer able to adapt to changing environment, or we lock into a sub optimal solution and eventually our competitors surpass us (see the earlier post on Borders).  March issued this warning to all organizations long before Clayton Christensen's Innovator's Dilemma.  The process of adaptation that makes us good at what we do now will destroy us down the road if we don't actively work to preserve exploration in our organization.  Which brings us back to Google.  Google is famous for so-called "20 percent time" in which engineers are asked to dedicate a full day a week to things "not necessarily in their job description."  This is Google's way of actively maintaining exploration in their organization.  So far, it seems to be working for them.

The Christakis and Fowler Social Networks Influence Brouhaha

Recently there has been a spirited conversation kicked off by the publication of an article, "The Spread of Evidence-Poor Medicine via Flawed Social-Network Analysis," by Russell Lyons regarding the well-publicized work of Nicholas Christakis and James Fowler on social contagion of obesity, smoking, happiness, and divorce.  The discussion has been primarily confined to the specialized circle of social network scholars, but now that conversation has spilled out into the public arena in the form of an article by Dave Johns in Slate (Dave Johns has written about the Christakis-Fowler work in Slate before).  Christakis and Fowler's work has received a huge amount of attention, appearing on the cover of the New York Times magazine, on the Colbert Report TV program, and a ton of other places (see James's website for more links).  Many others have made detailed comments on Lyon's article and on the original Christakis-Fowler papers.  I wish to address some of the related issues raised in Slate about scientists in the media

The article seems to criticize Christakis and Fowler for their media appearances, as though this publicity is inappropriate for scientists who should be diligently but silently working in the background, leaving it up to policy makers and the media to make public commentary and recommendations.  I think this criticism is not only wrong, but dangerous.  Many if not most researchers do work silently in the background, shunning the spotlight and scrutiny of the media, not out of shyness or fear of embarrassment, but because of a pervasive misunderstanding of scientific uncertainty.  Hard science is simply much softer than many people realize.

ALL scientific conclusions — from physics to sociology come with uncertainty (this does not apply to mathematics, which is actually not a science).  A "scientific truth" is actually something that we're only pretty sure is true.  But we'll never be definitely 100% sure, that's just how science works.  When one scientist says to another, we have observed that X causes Y, it is understood that what is meant is, the probability that the observed relationship between X and Y is due to chance is very small.  But, statements like that don't make for good news stories.  Not only are they uninteresting, but for most people they're unintelligible (which is not to say that the public is stupid — the concepts of uncertainty and statistical significance are extremely subtle and often misunderstood even by well-trained scientists).  So, many scientists avoid the media because we're asked to make definitive statements where no definitive statements are possible, or we make statements that include uncertainty that are ignored or misunderstood.

But we need scientists in the media.  Only a fraction of Americans believe the planet is warming and 40% of Americans believe in creationism.  Scientists in the media can help correct these misperceptions and guide public policy.  And, maybe even more importantly, scientists in the media can make science sexy.  We already live in a world where science and politics are often at odds, and in which scientists that avoid the media are often overruled by politicians that seek it out.  Scientists are already wary of making public statements that implicitly contain uncertainty for fear of them being interpreted as definitive. Christakis and Fowler have done us a great service by taking the risk of making statements and recommendations in the public arena based on the best of their knowledge, by raising public awareness of the science of networks, and by making science fun, interesting, and relevant.

Social Networks in the Classroom

Today's New York Times has an article on an educational software start-up that "has a social-networking twist."  The company, Piazza, provides a course page where students can ask and answer questions with moderation from the instructor.  I'm not sure what the "social networking" component of this site is.  From the Times article, it sounds simply like a message board with a few bells and whistles.  A quick search for the company's website left me empty handed, so we can only speculate that there is actually something more here.

In passing, the article raised another interesting point though: "As in the case of Facebook, the wildly popular social network that sprang from a Harvard dorm room, the close-knit nature of college campuses has helped accelerate the adoption of Piazza."  The idea that close-knit communities lead to increased technology adoption is something that I prove in my recent paper, "Friendship-based Games."  The idea of closely knit communities is captured by the clustering coefficient of a network.  This metric measures the probability that two individuals that share a mutual friend are friends with one another.  In the paper, I show using a game theoretic model that new (beneficial) technologies have an easier time breaking into a market in networks with high clustering.  The basic idea is that small communities of users can adopt the new technology and interact mainly with one another, protecting themselves from the incumbent.  This may be one of the reason that college campuses, which probably exhibit higher clustering than many other social networks, prove to be such fertile ground for the adoption of new innovations.

One person, one vote?

An article in the New York Times describes recent research by economists Brian Knight and Nathan Schiff on the relative impact of votes from different states in the presidential primaries. They estimate that a vote in an Iowa or New Hampshire primary has the impact of five Super Tuesday voters. The focus of the Times article is on the policy implications of this impact inequality. One of the interesting things about this research is how "impact" is measured.  What the article doesn't mention is why there is any impact difference in the first place. After all, mathematically, a vote in Iowa or New Hampshire counts just as much as one in New Jersey or Montana.

The way the Knight and Schiff estimated "impact" was to look at election polls before and after each primary.  They found that the polls shifted the most after early primaries.  Their theory is that voters are uncertain about the quality of different candidates, but learn (or infer) something about that quality by observing others.  This is kind of like noticing that a lot of people drive a certain type of car and then inferring that therefore it must be a pretty good car.  But, we could imagine several other stories.  For example, If voters that prefer candidate A perceive candidate B as a lock-in to win the nomination, then maybe they decide not to vote.  Prospective voters for candidate B on the other hand may continue to vote because they enjoy being on the winning side.  Some undecided voters may shift towards candidate B for the same reason.  A question that has perplexed political scientists and economists for decades is why does anyone vote in the first place.  A more careful look at these results could shed light on that question.

Nicholas Christakis at WIDS@LIDS

Today and tomorrow I’m at the Interdisciplinary Workshop on Information and Decision in Social Networks at the MIT Laboratory for Information and Decision Systems (WIDS@LIDS).  Nicholas Christakis gave a thought provoking talk this morning drawing on a lot of material from his book, Connected, written with James Fowler.  One of the first ideas he raised is that humans are unique in having a social pressure on our evolution.  Humans  and other species also face environmental and other species evolutionary pressures.  But, he argued that humans are unique in this social pressure because we live in close proximity and other human groups are one of the biggest threats that  we face.  He went on to say that possibly this unique social pressure is responsible for humans evolving intelligence, because in order to navigate the complexities of social interactions, we need substantial intelligence.  I’m not sure that I buy this argument though.  What about ants, bees, wolf packs, ... ?  All of these species work in groups, cooperate, and face competitive pressure from other groups, but none of them have evolved intelligence on a human scale.

Christakis ended his talk asking about why certain ideas are “sticky.”  I think this is a super interesting and super difficult question.  I’ve been talking with Adam Berinsky in the Political Science Department at MIT about this question in relation to political rumors.  Why does the rumor that Obama was born in another country stick around, but other rumors die out?  Christakis suggested that this might somehow be a tractable question, but I think it is much more subtle.  First of all, there are no natural metrics for judging ideas.  Second of all, we can’t just look at which ideas have actually taken off and which haven’t, because so many other chance factors come into play.  Because of the big positive feedbacks involved in the spread of ideas, this process is highly susceptible to chance tipping (see the work by Salganik, Dodds, and Watts).  It’s very east to fall into the trap that Duncan Watts sums up in the title of his recent book, Everything is Obvious Once You Know the Answer.  Once an idea does “go viral,” like the Birther rumor, it is tempting to make up a narrative that says, well of course that rumor spread because it has attributes x, y, and z.  But, if the rumor had died we could just as easily construct a different narrative explaining its failure.  Paul Lazarsfeld’s paper on The American Soldier gives a fantastic example of how we can trick ourselves into believing this kind of after the fact rationalization.

Diversity Trumps Accuracy in Large Groups

In a recent paper with Scott Page, forthcoming in Management Science, we show that when combining the forecasts of large numbers of individuals, it is more important to select forecasters that are different from one another than those that are individually accurate.  In fact, as the group size goes to infinity, only diversity (covariance) matters.  The idea is that in large groups, even if the individuals are not that accurate, if they are diverse then their errors will cancel each other out.  In small groups, this law of large numbers logic doesn’t hold, so it is more important that the forecasters are individually accurate.  We think this result is increasingly relevant as organizations turn to prediction markets and crowdsourced forecasts to inform their decisions.