Austin Improv Community Network
(Spoiler: just click on this)
Networks are a pervasive notion even in our everday lives. As examples, consider the network of all air travel across the world, or your favorite online social network. These common examples highlight some of the basic components of networks and begin to hint at why they're interesting to study. Networks are composed of nodes (such as airports or social media users) connected by edges (such as flight paths or social media reltionships). That's really it. Yet that simple framework can be used to think about many kinds of phenomena. Below, I'm going to construct a very simple one (a social network) which allows us to visualize relationships and groupings among people.
We're going to take a look at the community of improvisers in Austin, TX. Based on the performances from each improviser, we'll construct a visual represetation of the Austin Improv Network. Luckily, nearly all the performances, troupes, and shows in the Austin improv community have been fantastically documented in the Austin Improv Wiki, so it'll be very little work for us to extract the relevant information about each performer and to build this network.
So how exactly can I construct a network out of improv troupes or shows? Let's walk through a real simple example. First, we'll pick a troupe to start with - let's go with Pgraph. Pgraph has four members (Kaci, Valerie, Roy, and Kareem) and these will be the first nodes in our network. So below we have a network with four nodes in it (hover over the nodes so you can see who is who; drag em around if you want, they won't mind).
Now how are we going to connect the nodes of our Austin Improv Network? Well I've decided that two nodes (improvisers) will be connected by an edge if they appear in a troupe together. So with our first four nodes, they all appear in a troupe together (Pgraph), so there will be an edge between each of them.
Now we can move on to another troupe and add nodes and edges in this same manner. So if we look at Squirrel Buddies, we'll see that we need to add another node for Jon. And then we would put an edge between Roy and Jon because they're both in Squirrel Buddies. Similarly for The Amazon and the Milksop, we'll add Curtis to the network and put an edge between Kaci and Curtis.
I've scraped the Austin Improv Wiki and extracted this information for every troupe and every improviser. In the same way I just described, I've put together the network of all improvisers with connections meaning that two improvisers are in the same troupe at least once. That network is shown below. It's quite big and complex, so you should view it in a new page instead of just embedded in this page.
To visualize these networks, I'm using the awesome javascript library VivaGraph and a common layout technique called 'force-directed layout'. The strategy is to pretend that the nodes are repelled by each other yet are chained to one another by their edges. As the nodes move around, densely connected subsets of the network will remain near each other, while unconnected parts of the network will be highly separated. With this type of layout, we can see lots of cool structure and complexity in this network.
You might be wondering why the nodes are different colors and what that means. With any network (and this one especially), there are often subsets or clusters of nodes which are more highly connected amongst themselves than to the rest of the network. Because of this, we can imagine that the whole network is actually composed of various communities that are all connected together. We don't know what these communities are, but from the edges we can use a community detection algorithm to calculate which communities exist and their memberships.
The colors of the nodes in the improv network correspond to which community each node belongs to. We can see there are several large communities and a bunch of smaller communities. If we explore the nodes and the communities, we find that the detected communities have a slight correspondence to the various improv theaters in Austin (from which people are more likely to form troupes). There are lots of notable exceptions however, especially with highly connected nodes, who are more likely to form troupes with people from all over the network.
From this network information, we could start to ask questions like who is the most higly connected? Who is connected to the most 'important' people? Who acts as a 'bridge' between communities? There's lots of cool methods from Network Theory that allow to calculate these kinds of notions from this network data. I'll get around to that another day...
I've also done this same thing with improvisers that co-occur in mainstage shows, and this network is much more densely interconnected. A compelling visualization of the fruitful overlap and collaboration in the Austin Improv Community!
In case you're into this kind of thing, here is the R code I used for scraping the Austin Improv Wiki. This is the first time I've tried using R for web scraping, and thanks to the rvest package it was pretty painless.
# Scraping data from AIC Wiki and building a network library(rvest) troupes_page<-html('http://wiki.austinimprov.com/wiki/List_of_Austin_Improv_Troupes') troupes_list<-troupes_page %>% html_nodes('.DPLTest:nth-child(5) li') %>% html_text() %>% unique() nodes<-c() edge_list<-list() for (troupe in troupes_list){ print(troupe) # parse special characters and format URL troupe_string<-gsub(' ','_',troupe) troupe_string<-gsub('\\?','%3F',troupe_string) troupe_string<-gsub('&','%26',troupe_string) # fetch the page for each troupe and try to extract the cast troupe_page<-html(paste('http://wiki.austinimprov.com/wiki/',troupe_string,sep='')) try( cast<-troupe_page %>% html_nodes('.plainlist a') %>% html_text ) if(length(cast)>1){ # update node list nodes<-unique(c(nodes,cast)) # update edge_list all_pairs<-combn(cast,2) for (i in 1:ncol(all_pairs)){ possible_names<-c(paste(all_pairs[,i],collapse=' & '), paste(rev(all_pairs[,i]),collapse=' & ') ) if (any(possible_names %in% names(edge_list) )){ edge_name<- possible_names[possible_names %in% names(edge_list)] edge_list[edge_name]=(edge_list[edge_name][[1]]+1) } else{ edge_list[paste(all_pairs[,i],collapse=' & ')]=1 } } } }