Keywords: Sport, Performance, Data Wrangling, Graphics, Web Scraping

This tutorial will be a crash course on how to use R to conduct data science for sports. The tutorial will be interactive and participants will be able to work with real sports data, most in the sport of tennis. Activities will range from exploratory graphics to formal modelling. Substantive topics will include rally length distributions in tennis and understanding how player performance is influenced by pressure. After this course, participants will be able to scrape sports data from the Web, use graphics to explore data, apply statistical and machine learning models to address interesting questions in sport, and publish their findings to the Web.

Prerequisites: Basic proficiency in R and familiarity with the dplyr and ggplot2 packages.

Requirements: Laptop loaded with latest version of R. The specific packages to be pre-installed will be provided prior to the tutorial.

Outline:

  1. Getting the data: Web scraping and wrangling with sports data
  2. Exploratory data analysis in 2D and 3D
  3. Advanced programming to define new classes and methods for unique data types in sport
  4. Introduction to common ranking methods and predictions models in sport
  5. Sports blogging with markdown, github and interactive graphics

Instructor:

Stephanie Kovalchik is the lead data scientist in the Game Insight Group at Tennis Australia, the governing body of tennis in Australia, and a Research Fellow in sports analytics at the Institute of Sport, Exercise and Active Living at Victoria University. Author of several R packages, including deuce, a package for tennis statistics. Associate Editor for the Journal of Statistical Software. Creator and author of the tennis analytics blog ‘On The T’ at www.on-the-t.com.

References

www.github.com/skoval/deuce