Computational data science 2026¶
MSc studies in Data Science, Faculty of Mathematics and Information Science, Warsaw University of Technology
This course covers methods for data analysis and mining, statistics, machine learning, and artificial intelligence from a computational perspective. We will introduce, analyse, and implement key algorithms and data structures used for finding similar groups of objects, learning parameters of supervised machine learning models, conducting simulation experiments, etc. We will focus not only on implementing re-usable algorithms from scratch (acquiring programming skills needed to implement any algorithm when a ready-to-use implementation of a method is unavailable, its high-level R/Python prototype is too slow to run, or a customised modification thereof is required), but also calling methods from existing C/Fortran libraries (standing on the shoulders of giants, appreciating the effort of the open-source community). As a by-product, we will better understand the implementation of statistical software packages (R, Python with NumPy and Pandas, etc.).
Course prerequisites: Structured data processing (R and Python), Data structures and algorithms (sorting and searching), Numerical methods, Introduction to machine learning.
Schedule – Summer semester 2026¶
Lectures+Pracs (“Workshops”): Tuesdays, 16:15–20:00, 216 MiNI
classical, blackboard-based lectures; students are expected to participate actively in the classes, take notes, discuss/brainstorm all ideas, and overall happily relish all the food for thought mindfully seasoned and served by yours truly;
bring your own laptop for we will be implementing, testing, and applying the algorithms presented in the lectures too!
Office hours: 550@MiNI; I’m there on most days
Written assignments: Week 8 and 14
Project delivery and presentation: Week 15
1. 2026-02-24¶
- 🤔 Topics
Introduction to the course: why lower-level programming for data science?
Introduction to the C programming language (a minimalist subset of C17 for C23 is not universally supported yet):
variable declarations, common scalar types
arithmetic, comparison, and logical operators
if,switch,goto
- 📚 References
Recommended reading (choose either; the latter is an extended version of the former; however, sometimes it does not mark the GNU extensions to the C language clearly enough):
(!) T. Rothwell, J. Youngman, and others, The GNU C Reference Manual (skip the parts devoted to the GNU C extensions)
Further reading:
J. Gustedt, Modern C, Manning, 2019
(*) Programming Languages – C. International Standard ISO/IEC 9899:2018
Extras:
In praise of basic research and general curiosity: How did places like Bell Labs know how to ask the right questions?
I want you to be curious: Hacker News
- 🏠 Homework
On your laptop, ensure you have access to a GNU/Linux environment with gcc, Python, and R installed; see the Software section below
Recall the following terminal/Bash commands:
cd,pwd,ls,ls -l,mkdir,cp,cp -i,mv,rm,rm -rf,rm --help,manecho,touch,cat,cat > file,less,head,tail,nano, how to quitvichmod,chown,whereis,ln,top/htop,df,du,tar,zip/gzip/bzip2/xz,time,sleep,bg,fg,ps,kill,nicediff,grep(also:rg–ripgrep),find,sed,rename,uniq,sort
2. 2026-03-03¶
- 🤔 Topics
- 📚 References
see Week 1
Extras:
- 🏠 Homework
TBA
3. 2026-03-10¶
- 🤔 Topics
Introduction to the C programming language (cont’d):
benchmarking Python, Cython vs C cont’d
the C preprocessor cont’d
one program, many source and header files; the linker
- 📚 References
see Week 1
- 🏠 Homework
TBA
4. 2026-03-17¶
- 🤔 Topics
Introduction to the C programming language (cont’d):
record types (structures),
typedef, unions and enumsstatic arrays
pointers
dynamic memory allocation
…TODO…
- 📚 References
TBA
- 🏠 Homework
TBA
5. 2026-03-24¶
- 🤔 Topics
TBA
- 📚 References
TBA
- 🏠 Homework
TBA
6. 2026-03-31¶
- 🤔 Topics
TBA
- 📚 References
TBA
- 🏠 Homework
TBA
7. 2026-04-14¶
- 🤔 Topics
TBA
- 📚 References
TBA
- 🏠 Homework
TBA
8. 2026-04-21¶
- 🤯 Written Assignment 1
TBA
- 🤔 Topics
TBA
- 📚 References
TBA
- 🏠 Homework
TBA
9. 2026-04-28¶
- 🤔 Topics
TBA
- 📚 References
TBA
- 🏠 Homework
TBA
10. 2026-05-05¶
- 🤔 Topics
TBA
- 📚 References
TBA
- 🏠 Homework
TBA
11. 2026-05-19¶
- 🤔 Topics
TBA
- 📚 References
TBA
- 🏠 Homework
TBA
12. 2026-05-26¶
- 🤔 Topics
TBA
- 📚 References
TBA
- 🏠 Homework
TBA
13. 2026-06-02¶
- 🤔 Topics
TBA
- 📚 References
TBA
- 🏠 Homework
TBA
14. 2026-06-09¶
- 🤯 Written Assignment 2
TBA
- 🤔 Topics
TBA
- 📚 References
TBA
- 🏠 Homework
TBA
15. 2026-06-16¶
- 🤯 Project delivery
TBA
- 🤔 Topics
TBA
- 📚 References
TBA
- 🏠 Homework
TBA
Assessment methods and regulations¶
There are two written assignments (33+33 points) and an individual programming project+presentation (33 points).
During the written assignments, electronic devices must not be used. However, a single A4-sized sheet of hand-written notes can be brought along.
AI tools (ChatGPT, Copilot, Claude Code etc.) are not allowed in the project part. The use of non-permitted materials results in a failing grade as per the University’s Study Regulations.
Class attendance is compulsory. Five pts will be deducted from the final result for each unjustified absence.
The final grade will reflect the extent to which your written and programming assessment tasks met the prescribed quality criteria.
The total result of ≤50 pts translates to the 2.0 grade; (50, 60] – 3.0; (60, 70] – 3.5; (70, 80] – 4.0; (80, 90] – 4.5; and >90 – 5.0.
Software¶
We’ll need a GNU/Linux operating system with root access, e.g., Ubuntu, Kubuntu, Lubuntu, Linux Mint – pick the one you find more appealing, it’s a matter of taste (I’m on openSUSE Tumbleweed, but it’s less beginner-friendly).
If you use a different operating system family, you can run Linux on a virtual machine; see, e.g., how to run Ubuntu using VirtualBox.
Programming languages: C, C++ (gcc/clang), R, Python (CPython, Cython), Fortran; on Ubuntu, run in the terminal:
sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get -y install r-base-dev python3-dev pandoc
sudo apt-get -y install cython
sudo apt-get -y install jupyter-notebook
References¶
The list is non-exhaustive. More references will be provided in the lectures (also: see above).
T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, MIT Press and McGraw-Hill, 2022
W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes. The Art of Scientific Computing, 3rd ed., Cambridge University Press, 2007
S. Marsland, Machine Learning: An Algorithmic Perspective, Chapman&Hall/CRC, 2015
A. Blum, J. Hopcroft, R. Kannan, Foundations of Data Science, 2018
R.A. van de Geijn, E.S. Quintana-Orti, The Science of Programming Matrix Computations
V. Eijkhout, The Art of HPC, 2026
J.E. Gentle, Matrix Algebra: Theory, Computations and Applications in Statistics, Springer, 2024
M. Gagolewski, Deep R Programming, 2026
M. Gagolewski, Minimalist Data Wrangling with Python, 2026
D. Goldberg, What every computer scientist should know about floating-point arithmetic, ACM Computing Surveys 21(1), 1991, 5–48
N.J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, 2002
G.H. Golub, C.F. Van Loan, Matrix Computations, The Johns Hopkins University Press, 2013
(*) D.E. Knuth, The Art of Computer Programming, Vols. 1–4B, Addison-Wesley, 2023
(*) B.W. Kernighan, D.M. Ritchie, The C Programming Language, Prentice Hall, 1988
J. Gustedt, Modern C, Manning, 2019
G. Barlas, Multicore and GPU Programming, MK, 2022
N. Matloff, Parallel Computing for Data Science: With Examples in R, C++ and CUDA, CRC Press, 2016
T. Rothwell, J. Youngman, and others, The GNU C Reference Manual (skip the parts devoted to the “GNU extensions”)
(*) J. Arndt, Matters Computational: Ideas, Algorithms, Source Code, Springer, 2011
The GNU C Language Manual (skip the parts devoted to the “GNU extensions”)
(*) Programming Languages – C. International Standard ISO/IEC 9899:2018
R Core Team, Writing R Extensions, 2026
R Core Team, R Internals, 2026
Source code of: