Days back I was wondering how I could plot some data on a map using R. Luckily I was able to find some good information by querying Google and Stack Overflow. I mixed the info I got from those sites with some baseball data and now I’m showing you the results!
First of all, make sure you have the required packages to plot the map:
library( package = "ggplot2" ) | |
library( package = "rgeos" ) | |
library( package = "maptools" ) |
After that go to the Global Administrative Areas webpage and download a map from the GADM spatial database. Please note that since I’m doing some MLB stuff in R for I downloaded the level 1 USA map in an R Spatial Polygons Data Frame format. If you need a more (or less) detailed map please check the different map levels offered in the website.
Once you downloaded the file, push it into the R environment using readRDS and transform it into a data.frame using ggplot2::fortify. Note that fortify will transform the spatial object into a data.frame that ggplot2 can understand. Also, please take a close look at line 8. There I’m a creating a new column in the map dataset that contains the state names without any punctuation character. This is helpful since eventually you will want to join data by country( cities, provinces, etc. ) but R joins might not work properly with punctuation characters because of encoding stuff. Line 13 just removes Alaska and Hawaii from my map.
# Read GADM file. | |
map <- readRDS( file = "USA_adm1.rds" ) | |
# Create Data Frame from spatial object. | |
map <- fortify( map, region = "NAME_1") | |
# Create new state column without any special characters. | |
map$state <- stringi::stri_trans_general( str = map$id | |
, id = "Latin-ASCII" | |
) | |
# Remove Alaska and Hawaii | |
map <- map[ ! map$state %in% c( "Alaska", "Hawaii" ), ] |
Once your map data is ready, it’s time to load some other data you would like to show in your plot. In this case I got the MLB players’ place of birth data from Baseball-Reference and I loaded into R. Kindly note that line 9 does the same string conversion I did before for the map data.
# Read team file | |
players <- read.csv( file = "players.csv" | |
, sep = "," | |
, stringsAsFactors = F | |
, na.strings = "" | |
) | |
# Remove accents, symbols, etc | |
players$state <- stri_trans_general( str = players$state | |
, id = "Latin-ASCII" | |
) |
Now that all of our data is ready, we are ready to join it:
# Join map and championships | |
data <- dplyr::left_join( x = map | |
, y = players | |
, by = "state" | |
) |
And plot it:
# Create map | |
( ggplot() | |
+ geom_polygon( data = data | |
, mapping = aes( x = long | |
, y = lat | |
, group = group | |
, fill = players | |
) | |
, color = "white" | |
) | |
+ labs( fill = "Players born" ) | |
+ coord_map() | |
+ theme( panel.grid.minor = element_blank() | |
, panel.grid.major = element_blank() | |
, panel.background = element_blank() | |
, panel.border = element_blank() | |
, axis.ticks = element_blank() | |
, axis.text.x = element_blank() | |
, axis.text.y = element_blank() | |
, axis.title = element_blank() | |
) | |
) |
You can access the full code from here.