I use the network visualization software Gephi almost everyday, especially when I am teaching Social Dynamics and Network Analytics at Kellogg. So, I was pretty concerned when I realized after upgrading my Mac laptop and desktop to the new Mac OS X Mavericks that Gephi was no longer working on either. Luckily, there is a solution: Installing this Java update from Apple seems to fix everything up.
To get started, follow steps 0 and 1 here to setup a Twitter account and download the NodeXL software. Then, to download the network data, click on Import and select From Twitter Search Network… In the first dialog box, enter the search term that you want to look for. Any account that recently posted a tweet containing this phrase will end up being a node in your network. In the book, "Analyzing Social Media Networks with NodeXL," there is some good advice on choosing an appropriate trending topic to look at:
"First, the search phrase has to concern a recent event. Though Twitter has been around for several years, the volume of information being produced every second is so huge that the search interface has limits on how many tweets it will return for a given query, or how old tweets can be. Searching for "2008 Election" may in theory produce a valuable set of tweets about the election cycle, but in practice those tweets are too far back in time for the search interface to collect them efficiently. The second criterion is that the search phrase has to relate to a piece of news, promotion, event, and so on that is u contagious" (i.e., Twitter users who see the message will, at least in principle, want to pass it on to their followers). A search phrase like "Thanksgiving" is a trending topic on Twitter (shortly before and on Thanksgiving) but lacks a contagious property-there is no need to pass on the message because a large fraction of the population already knows about it, so tweets about Thanksgiving are independent events rather than the sign of a "Thanksgiving meme" spreading throughout the Twitter population."
One good way to do this is look through the recent tweets of a popular user for something that you think would be sufficiently interesting that other people would retweet the message. For example, in the network below, I gathered data on tweets containing the phrase "Who Googled You?" This Twitter meme originated with Pete Cashmore, of @mashable, and links to a Mashable article that describes a way to find out who has been searching for you on Google. The article generated a flurry of interest and many other people tweeted links to the article, generally repeating the original article title, "Who Googled You?" Since this meme spread from person to person, it was a good candidate for visualizing as a Twitter search network.
You can select what relationships you want to use to define the edges of your network by selecting any combination of the following choices:
Follows relationship — two accounts are connected if one account follows the other.
"Replies-to" relationship in tweet — two accounts are connected if one account replies to the other in its tweet.
"Mentions" relationship in tweet — two accounts are connected if one account mentions the other account in its tweet.
As discussed in the previous post, because of Twitter rate limits, it is advisable to limit your request to a fixed number of people. Unless you are especially patient, I recommend starting with just 300 people.
Once you download the data using NodeXL, I like to export it as a graphml file and then visualize it in Gephi. In this example, I did a few things to make the visualization more meaningful, which I describe below.
Before getting started with manipulating the network in Gephi, it is a good idea to go into the Data Laboratory and delete some of the columns that NodeXL created. You should delete anything having to do with the color or size of the nodes or edges, or centrality measures such as PageRank and eigenvector centrality. These columns are generally empty, but unless you delete them, Gephi won't overwrite them when you ask it to calculate these measures, so you won't be able to calculate and make use of them in your analysis. For some general tips on using Gephi, check out the FAQ here.
First, I filtered out all of the accounts except those that belong to the largest connected component of the network. This makes the network much more readable, and allows us to focus only on those nodes involved in a large cascade. After trying a few options, I choose the Force Atlas layout algorithm to arrange the nodes. For Twitter networks, I have found Force Atlas to generally give the best layout. Usually, I have to increase the repulsion strength from the default setting of 200 to 2000 or more. Then I resized the nodes according to their degree so we can get a sense for who the most important nodes in the network are. I also tried sizing the nodes by PageRank and eigenvector centrality for comparison. For the most part these different centrality measures didn't make much difference, although one account, @darrenmcd, appears significantly more important according to PageRank or Eigenvector centrality than degree centrality. The Twitter accounts @briansois and @armano standout as the most influential in the network. I colored the nodes according to which community they belong to as identified using Gephi's implementation of the Girvan-Newman modularity based clustering algorithm, and I colored the edges according to the type of relationship between the Twitter accounts. Blue edges are "followed" relationships, green edges are "mentions" and purple edges are "replies to." We can see that almost all of the links to @armano mention the relationship explicitly, and about half of those to @briansois do.
This is a visualization of Twitter accounts that follow and are followed by @gephi that I made using ... Gephi. I collected the data using NodeXL. Two accounts are linked in the network if one follows the other on Twitter. Nodes are sized according to their degree. The modularity clustering algorithm finds 8 different groups among the accounts. The blue group in the upper left, where I live, contains most of the network science crowd: @duncanjwatts, @ladadimc, @barabasi, @davidlazer, etc... The green group in the lower right seem to be data/visualization folks. I filtered out all of the nodes with degree less than four, before which there is a large contingent of accounts that followed @gephi, but with no other connections in the network.
This is a visualization of my own Facebook network that I made using the (free) software Gephi and the Facebook application netvizz. Each node in the network is one of my Facebook friends, and two friends are connected to one another if they are Facebook friends with each other. The size of the node corresponds to the "degree" of the node, which means how many connections it has. In this case, that means how many of my Facebook friends that person is Facebook friends with. (Note: I deleted the names from the nodes to protect my Facebook friends' privacy).
The colors of the nodes indicate communities of friends found using a clustering algorithm based on the "modularity" of the network. Basically what the algorithm does is try to group the nodes into communities with lots of connections within each community and not too many connection between the communities. Even though the algorithm doesn't know anything about my friends, other than the web of connections (it doesn't even know they're people), it does a good job of picking identifying groups of my friends that belong to the same communities in real life. For example the purple cluster in the upper right are people I know from graduate school, the little green cluster in the lower right are people from the Northwestern Institute on Complex Systems. The big bunch in the middle are people I know from high school, with the people from the band (or band groupies) in green on the right side. My wife is the purple node that bridges the gap between my graduate school friends and my huh school friends.