Projecting a Bipartite Network in Gephi

A bipartite network is one in which the nodes can be split into two groups, A and B, such that all of the links join nodes from group A with nodes from group B. There are no edges connecting two group A nodes with each other  or connecting two group B nodes with each other. This network of HBO shows and the actors that appeared in them is a nice example (originally posted here).

In this network the nodes are either actors or shows. Actors are connected to the shows they starred in, but there are no links connecting two actors to each other or two shows to each other because, of course, actors can't start in other actors and shows can't star in other shows. Other examples might include doctors and patients with doctors connected to the patients that they see, or students and clubs with students connected to the clubs that they are members of.

Every bipartite network can be projected to give two networks that have only one type of node. For example, our HBO network could be projected to give a a network of just actors, where two actors are connected if they stared in the same show; and the bipartite HBO network can also be projected to a network of just shows, where two shows are connected if the same actor starred in both of them. Projecting a bipartite network loses information, but sometime highlights specific features of a network that we want to focus on.

If you have a bipartite network in Gephi, there is a tool for automatically creating a projection. First, you need to add an attribute to the nodes that describes what type each node is, e.g. is it an actor or a movie. You do this by importing a nodes table with one column the node Ids and a second column giving the node type. So, you now have a new node attribute, maybe called nodeType, with values "actor" or "movie". At this point, I recommend saving a copy before proceeding.

The next thing you are going to do is install a plugin to Gephi called MultiMode Networks Transformation. Under the tools menu, choose plugins. Then under available plugins, select MultiModeNetworks TransformationPlugin. (If you have trouble installing the plugin this way, instead you can download the plugin here. Then in Gephi go to Tools... Plugins...Downloaded plugins, and select the downloaded file.) Once you have the plugin installed, under Window you should have a new window available called MultiMode Projections. Open this up, and hit the Load attributes button. Select nodeType for your Attribute type and click the Removed Edges and Remove Nodes buttons. Finally, you have to choose which projection you want to make. It's important that you have saved your work here, because Gephi does not have an undo button, and this next step will permanently change your network. You have to choose the left and right matrix to get the projection that you want. This works like matrix multiplication. If you want to project to an actor to actor network, choose "actor-movie" as your left matrix and "movie – actor" as your right matrix and hit run. If you want a movie to movie network, choose "movie – actor" as your left matrix and "actor – movie" as the right matrix and hit run. You should be left with the appropriate projected network.


Scale-Free Network

I made this visualization of a scale-free network for a recent talk at the 2015 KIN Global conference. Scale-free networks have a power law degree distribution. This means that if you count the fraction of nodes in the network that have one connection, two connections, three connections, and so on, and plot that distribution the graph looks roughly like this:

What this graph tells us is that most of the nodes have few connections — in this plot, around fifty percent of the nodes have four or fewer connections — but a few nodes have lots of connections. This distribution is called a power law and is described by the equation f(x)=cxα. It's very different from the more commonly known normal distribution. When something follows a normal distribution, most of the time that quantity is pretty close to the average value. For example, the height of the average American man is 5' 9", and most men are not that far from this average. Only 3.9% of men are 6' 2" or taller. With a power law distribution, most of the time our quantity is small — in our network most of the nodes have few connections — but occasionally we see really large values. Many quantities follow a power law distribution including wealth, the size of firms, the magnitude of earthquakes, the diameter of craters on the moon, and sales of books. Networks often have power law distributions for the number of connections.

You can generate a network with a power law degree distribution through a process known as preferential attachment. In this process, you add nodes to the network one at a time. Each time a node is added to the network it forms a connection with an existing node in the network. With some probability p the node chooses which of the existing nodes to connect to completely at random. With probability 1-p the new node selects which existing node to connect to in proportion to the nodes' existing number of connections, so it is more likely to connect to nodes that already have many connections.

The scale free network below was generated using this process as implemented in the R package i.graph (the R code is available here.) I exported an edge list for the network (which you can download here), and then created the visualization using Gephi.

Inventing Abstraction: Visualizing the Social Network Behind Artistic Innovation

Check out this cool video on a visualization of the social network of artists behind the invention of abstraction. Thanks to Merwan Ben Lamine for sharing it with me. You can explore the network here.


Identifying the Origin of an Epidemic

My NICO colleague (and office mate) Dirk Brockmann has done some very cool work on identifying where an epidemic began by looking at how it has spread through airline networks. Centuries ago the spread of disease was constrained by geography (like the spread of obesity appears to be today). If you look at a map of where the disease has spread to over time, you could easily trace it back to it's origin — the epidemic just spreads out like ripples in a pond. But today, diseases hitch a ride on the air travel network, so where the disease spreads fastest has little to do with physical geography. What Dirk figured out though is that geography is just another sort of network. If we arrange the airline network in the right way, with the origin of an epidemic at the center, we will again see the contagion spreading out like ripples in a pond. Check out this article in The Atlantic explaining the research.

Scott Page Model Thinking Course — Now on Twitter

Scott Page's FREE online course on complex systems and modeling is starting up again today on Coursera. The first round of the course was awesome (highly recommended here), and I am sure this one will be too. This time around Scott has also added some new features to the Model Thinking course, a Twitter account and a Facebook page.

Northwestern's Defeat to the Illini as Seen on Twitter

The title says it all.  Here's the link.

The S&P credit downgrade, turmoil in the markets, and the 1973 toilet paper shortage

On Friday, August 5, Standard & Poor's downgraded the credit rating of the U.S. long-term debt to AA+.  On Monday, the first day the markets opened since the downgrade, the Dow Jones Industrial average dropped 5.6 percent and the S&P 500 fell 6.7 percent — the biggest single day drops since the crisis in 2008.  A lot of people might be confused about this turmoil in the markets, since US debt is still considered one of the safest investments there is.  Jay Forrester, founder of the field of System Dynamics, calls puzzles like this the "counterintuitive behavior of social systems."

Undoubtedly, the world economy is incredibly complex, and no individual or organization has a complete picture of how it works or where it's headed.  Through pricing, the market is supposed to aggregate all of the pieces of partial information that we each hold and then converge to the "truth" — that is prices should reflect true underlying value.  In some situations this can actually work.  Prediction markets have been shown to be valuable tools for businesses to harvest the "wisdom of the crowds" and assess the probabilities that future events occur.  But, this mechanism works best when individuals place their trades independently based on their own private information. In the real world, market dynamics are fundamentally social dynamics and as such they are subject to cascades of panic and the accumulation of overconfidence (what Alan Greenspan famously referred to as "irrational exuberance" (see also Robert Shiller)).


The current panic illustrates how even when there is no fundamental basis for a panic, social dynamics can amplify the signal of a panic to the point where an actual crisis ensues.  The gas shortages of 1979 are a classic example of this phenomenon.  The Iranian revolution sharply cut oil imports to the US from Iran.  Nervous consumers rushed to top off their tanks and even to hoard gasoline at home.  This drained the supply of gasoline at filling stations leading to an actual gasoline shortage.  Word-of-mouth and media coverage reinforced consumer fears of shortages, leading to even more topping off and hoarding, as well as government policies such as odd/even day purchase rules that actually further incentivized consumers to top off frequently and store gasoline at home.  Surprisingly, despite the very real shortage of gasoline at filling stations, US oil imports for the year actually increased in 1979 compared 1978.  The crisis was caused by social dynamics, not an actual drop in supply. (See Sterman, Business Dynamics p. 212).

A similar but more comical crisis occurred in 1973 when Johnny Carson made a joke saying, "You know what’s disappearing from the supermarket shelves?  Toilet paper.  There’s an acute shortage of toilet paper in the United States."  Consumers rushed out to stock up on toilet paper, leading to a real toilet paper shortage in the US that lasted several days.  Even though Carson tried to correct the joke a few days later, by that time toilet paper was in fact in short supply because people were hoarding it at home.

Scott Page elected to the American Academy of Arts and Sciences

My friend, collaborator, and mentor Scott E. Page was elected to the American Academy of Arts and Sciences this year.  Congratulations Scott!  Scott is best known for his work (with Lu Hong) on the benefits of diversity for problem solving.  Together, Scott and I have written papers on group forecasting, tipping points, and markets with positive feedbacks.  Scott’s Ph.D. is from the MEDS Department at Kellogg.

Did You Know?

Thanks to Brian Uzzi for sending me a copy of this thought provoking video, Did You Know?  Interestingly, the success of the Did You Know? video itself is a product of the rapidly expanding global communication that the video describes.  According to the creator’s history of the presentation, the video was originally a PowerPoint deck made for a high school faculty meeting in 2006.  Soon thereafter the video “went viral” and by June 2007 had been viewed by over 5 million people online.  The most recent version (4.0) has been viewed over 2 million times on YouTube.


Some of the interesting stats from the video:

  • There are over a trillion web pages and 65,000 iPhone apps.
  • The average American teen sends 2,272 text messages a month.
  • Dell claims to have earned $3 million from Twitter posts since 2007.
  • In February 2008, Barack Obama raised $55 million dollars without attending a single campaign fundraiser.


Sources for the stats are available here.