Keegan Hines

Baseball Dynasties

I really like visualizations like this which accompany the wikipedia pages for most Premeier League teams. The idea is to plot the year-end ranking of a team over its entire history. It hit me that I've never seen such a thing for any American sports, so I wanted to give it a try and also make use of d3 for some interactive graphics.

I'm going to use baseball teams, since it's playoff season. I've grabbed data from BaseballReference.com which provides lots of information about each team for each year, though we're only interested in the rankings. In this case, the rankings are due to overall win-loss percentage and not due (exclusively) to outcomes in the post-season. I've put together the results for each team with help from d3 and rCharts.

For simplicity, I've just grabbed data since 1950, and I've subselected the teams that haven't changed names or moved markets in that time (so the Nats and the Marlins are out, for example). For the remaining teams, we can visualize their rank at the end of each year. With the legend at the top, you can double click on teams to select or deselect them. As expected, we can use this tool to reveal the fairly consistent high rankings for the Yankees and, well, the opposite for the Cubs.

My python code for scraping the data from BaseballReference.com is below.

import urllib2
import lxml.html

years=range(1950,2013)
allRankings={}
for yr in years:
    s='http://www.baseball-reference.com/leagues/MLB/%s-standings.shtml'%yr
    page = urllib2.urlopen(s).read()
    doc=lxml.html.fromstring(page)
    t=doc.cssselect('table')
    t=t[len(t) - 1]
    row=t[2]
    
    yr_rankings={}
    count=1
    for team in row.cssselect('a'):    
        yr_rankings[team.get('title')]=count
        count+=1    
    allRankings[yr]=yr_rankings