wikiweather Dataset

Description

Monthly climate data for 867 cities all around the World web-scraped by Anna Cena and Marek Gagolewski from the English Wikipedia on January 12, 2016.

If used in publications, please cite this dataset as: Cena A., Gagolewski M., Fuzzy k-minpen clustering and k-nearest-minpen classification procedures incorporating generic distance-based penalty minimizers, In: Carvalho J.P. et al. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems, Part II (Communications in Computer and Information Science 611), Springer, 2016, pp. 445-456, doi:10.1007/978-3-319-40581-0_36.

Each of the 12 CSV files consists of the following columns (missing values are allowed):

  • City,
  • Average high temperature [°C],
  • Average low temperature [°C],
  • Daily mean temperature [°C],
  • Record high temperature [°C],
  • Record low temperature [°C],
  • Average afternoon relative humidity (%),
  • Average precipitation days,
  • Average precipitation [mm],
  • Average rainfall [mm],
  • Average rainy days,
  • Average relative humidity [%],
  • Average relative humidity [%] at 15:00 LST,
  • Average relative humidity [%] at 9am,
  • Average relative humidity [%] at Daytime,
  • Average relative humidity [%] daily average,
  • Average snowfall [cm],
  • Mean daily sunshine hours,
  • Mean monthly sunshine hours,
  • Percent possible sunshine,
  • Record low wind chill

Obviously, the data sets require a bit of data cleansing.

Example

An exemplary clustering based on average low and high temperatures in January and July:

wikiweather