Other Resources

Datasets

For Teaching and Training

See comment lines for a detailed description of each dataset.

Own

nycflights13

Hadley Wickham's nycflights13-0.2.1 (licensed under CC0, gzipped) – on-time data for all flights that departed NYC (i.e., JFK, LGA, or EWR) in 2013:

All the logs are available at the webpage of the US Department of Transportation. Arunkumar Srinivasan’s github repository gives some nice R code to access the 2014 data.

babynames

Hadley Wickham's babynames-0.2.1 (licensed under CC0, gzipped) – US Baby Names 1880-2014:

fueleconomy

Hadley Wickham's fueleconomy-0.1 (licensed under CC0, gzipped) – fuel economy data from the EPA, 1985-2015:

nasaweather

Hadley Wickham's nasaweather-0.1 (licensed under GPL-3, gzipped):

R Built-ins

The following datasets are included in the datasets package for GNU R:

From Other Sources

How To Access

In R:

airlines <- read.csv("nycflights13_airlines.csv.gz", comment.char="#")
head(airlines)
##   carrier                     name
## 1      9E        Endeavor Air Inc.
## 2      AA   American Airlines Inc.
## 3      AS     Alaska Airlines Inc.
## 4      B6          JetBlue Airways
## 5      DL     Delta Air Lines Inc.
## 6      EV ExpressJet Airlines Inc.

In Python:

import pandas as pd
airlines = pd.read_csv("nycflights13_airlines.csv.gz", comment="#")
airlines.head()

To print comment lines, call, e.g.:

import gzip
with gzip.open("nycflights13_airlines.csv.gz", "rt") as f:
    while True:
        x = f.readline().strip()
        if not x.startswith("#"): break
        print(x)

Other