Baseball SeRies: Plotting the BbRef Game Results graph.

We all have visited the Baseball-Reference site. Moreover, some of us analyze its data in a daily basis.  At least in my case, that’s true. As some of you may already know, I’m writing a thesis on Baseball as part of the requirements for my Masters Degree in Computer Science.

Even though I cannot tell very much about my thesis, I must say that I’ve written a’lot of lines of code to compute a bunch of metrics( probably 100+ stats in total ) for pitchers, batters, fielders and even parks from Retrosheet’s files. And here is were BbRef comes into play. I’ve compared most of the metrics( i.e. HBP, H, BB, GDP ) worked by my programs to the site’s metrics, and to be honest, If it hasn’t been for BbRef, my code would have been buggier than Windows 98 ( badum tss ).

But really, BbRef is a very good site with a’lot of good information about players, franchises and it even has the log for thousands of games.  One cool feature about the site ( apart of those already mentioned ) is that it displays a bar graph( as the one below ) for every team in every season. This graph displays basic information about every game played by the team in that season such as game date, runs scored, runs allowed, etc.

gr

So apart from thanking the BbRef team in my thesis for maintaining such a great site, I also wanted to show my gratitude to them by teaching you how to create the Game Results  graph in R using Retrosheet’s game files as the input data.

The 2002 Oakland Athletics

The next graph shows the game results for the Oakland Athletics’ 2002 season. As you may already know( either because you saw the Moneyball movie or you’re a baseball nerd )  the A’s won 20 games in a row during that season. That is the longest winning streak since the Chicago Cubs’ winning streak of 21 games in 1935.

Rplot01

The Code

The above graph was done using the ggplot2 package. So if not installed in your system yet, make sure you install it and run the following code.  As usual, I’m making use of the dplyr and data.table packages. Feel free to get the 2002 season game file from here.

library( package = 'dplyr' )
library( package = 'ggplot2' )
library( package = 'data.table' )
# Column types
l_g_cols <- c( 'NULL'
, 'character' # GAME_DT
, rep( x = 'NULL', times = 5 )
, 'character' # AWAY_TEAM_ID
, 'character' # HOME_TEAM_ID
, rep( x = 'NULL', times = 25)
, 'numeric' # AWAY_SCORE_CT
, 'numeric' # HOME_SCORE_CT
, rep( x = 'NULL', times = 143)
)
# Column names
l_g_names <- c( 'GAME_DATE','AWAY_TEAM','HOME_TEAM','AWAY_SCORE','HOME_SCORE' )
# Read the 2002 season file.
d_games <- fread( input = '2002.csv'
, header = T
, colClasses = l_g_cols
, col.names = l_g_names
)
# Come with a data.frame
# Convert GAME_DATE to a Date variable
# Get the outcome of each game: Did the Athletics win or lose?
# Get the run difference. If Athletics won dif > 0, else dif < 0.
d_games <- filter( .data = d_games, HOME_TEAM == 'OAK' | AWAY_TEAM == 'OAK' ) %>%
mutate( GAME_DATE = as.Date( x = GAME_DATE, format = '%Y%m%d' )
, OUTCOME = ifelse( test = ( HOME_TEAM == 'OAK' & HOME_SCORE > AWAY_SCORE )
| ( AWAY_TEAM == 'OAK' & AWAY_SCORE > HOME_SCORE )
, yes = 'WIN'
, no = 'LOSE'
)
, SCORE_DIFF = ifelse( test = OUTCOME == 'WIN'
, yes = abs( x = HOME_SCORE - AWAY_SCORE )
, no = - abs( x = HOME_SCORE - AWAY_SCORE )
)
)
# Create plot
# Fill of the bars depend on the outcome of the game. Won = Green, Lose = Red
# Y = Difference between the team scores.
g_plot <- ( ggplot( data = d_games
, aes( x = GAME_DATE
, y = SCORE_DIFF
, fill = OUTCOME
, color = OUTCOME
)
)
+ geom_bar( stat = 'identity'
, width = 0.7
, alpha = 0.5
)
+ theme( axis.ticks = element_blank()
, axis.text = element_blank()
, title = element_blank()
, panel.background = element_blank()
, legend.position = 'none'
)
+ scale_fill_manual( values = c('red2','green4') )
+ scale_color_manual( values = c('red2','green4') )
)
view raw referencePlot.R hosted with ❤ by GitHub
Anuncio publicitario

Un comentario en “Baseball SeRies: Plotting the BbRef Game Results graph.

Deja un comentario

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Salir /  Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Salir /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Salir /  Cambiar )

Conectando a %s