Computational Data Science

MSc studies in Data Science, Faculty of Mathematics and Information Science, Warsaw University of Technology

This course covers methods for data analysis and mining, statistics, machine learning, and artificial intelligence from a computational perspective. We will introduce, analyse, and implement key algorithms and data structures used for finding similar groups of objects, learning parameters of supervised machine learning models, conducting simulation experiments, etc. We will focus not only on implementing re-usable algorithms from scratch (acquiring programming skills needed to implement any algorithm when a ready-to-use implementation of a method is unavailable, its high-level R/Python prototype is too slow to run, or a customised modification thereof is required), but also calling methods from existing C/Fortran libraries (standing on the shoulders of giants, appreciating the effort of the open-source community). As a by-product, we will better understand the implementation of statistical software packages (R, Python with NumPy and Pandas, etc.).

Prerequisites:

  • Structured data processing (R and Python),

  • Data structures and algorithms (sorting and searching),

  • Numerical methods,

  • Introduction to machine learning.

Schedule

Summer semester 2025/2026:

  • Lectures+Pracs (“Workshops”): Tuesdays, 16:15–20:00, 216 MiNI

    • classical, blackboard-based lectures; students are expected to participate actively in the classes, take notes, discuss/brainstorm all ideas, and overall happily relish all the food for thought mindfully seasoned and served by yours truly;

    • bring your own laptop for we will be implementing, testing, and applying the algorithms presented in the lectures too!

  • Office hours: 550 MiNI; I’m there on most days

  • Written assignments: Week 8 and 14

  • Project delivery and presentation: Week 15

TBA

Assessment Methods and Regulations

There are two written assignments (33+33 points) and a programming project+presentation (33 points).

During the written assignments, electronic devices must not be used. However, a single A4-sized sheet of hand-written notes can be brought along.

AI tools (ChatGPT, Copilot, Claude Code etc.) are not allowed in the project part. The use of non-permitted materials results in a failing grade as per the University’s Study Regulations.

Class attendance is compulsory. Five pts will be deducted from the final result for each unjustified absence.

The final grade will reflect the extent to which your written and programming assessment tasks met the prescribed quality criteria.

The total result of ≤50 pts translates to the 2.0 grade; (50,60] – 3.0; (60,70] – 3.5; (70,80] – 4.0; (80,90] – 4.5; and >90 – 5.0.

References

The list is non-exhaustive. More references will be provided in the lectures (also: see above).

  1. T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, MIT Press and McGraw-Hill, 2022

  2. W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes. The Art of Scientific Computing, 3rd ed., Cambridge University Press, 2007

  3. S. Marsland, Machine Learning: An Algorithmic Perspective, Chapman&Hall/CRC, 2015

  4. M. Gagolewski, Deep R Programming, 2026

  5. M. Gagolewski, Minimalist Data Wrangling with Python, 2026

  6. D. Goldberg, What every computer scientist should know about floating-point arithmetic, ACM Computing Surveys 21(1), 1991, 5–48

  7. N.J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, 2002

  8. G.H. Golub, C.F. Van Loan, Matrix Computations, The Johns Hopkins University Press, 2013

  9. NIST Digital Library of Mathematical Functions

  10. (*) D.E. Knuth, The Art of Computer Programming, Vols. 1–4B, Addison-Wesley, 2023

  11. T. Rothwell, J. Youngman, and others, The GNU C Reference Manual (skip the parts devoted to the “GNU extensions”)

  12. The GNU C Language Manual (skip the parts devoted to the “GNU extensions”)

  13. The GNU C Library

  14. J. Gustedt, Modern C, Manning, 2019

  15. R.J. Hyndman, Y. Fan, Sample quantiles in statistical packages, American Statistician, 50(4), 1996, 361–365. DOI: 10.2307/2684934.

  16. (*) B.W. Kernighan, D.M. Ritchie, The C Programming Language, Prentice Hall, 1988

  17. (*) J. Arndt, Matters Computational: Ideas, Algorithms, Source Code, Springer, 2011

  18. (*) Programming Languages – C. International Standard ISO/IEC 9899:2023, draft

Source code of:

  1. R (mirror)

  2. Python

  3. NumPy

  4. SciPy

  5. Pandas

  6. scikit-learn

  7. data.table

  8. dplyr

  9. GNU GSL

Software

Programming languages: C, C++ (gcc/clang), R, Python (CPython, Cython), Fortran

A GNU/Linux operating system with root access. Install, e.g., Ubuntu/Kubuntu on a virtual machine (e.g., VirtualBox) if you use a different operating system family.