5 Makefile Essentials
This chapter introduces Make, a lightweight automation tool used to define and run repeatable tasks.
Makefiles help streamline workflows by turning multi-step processes into simple, declarative commands such as make data or make book. This approach improves reproducibility, reduces manual errors, and keeps your project organized.
5.1 A Simple QA (Quality Assurance) Target
This example Makefile target performs a quick data check by reporting the number of rows in a processed CSV and previewing its contents.
.PHONY: qa
qa:
@echo "Rows:" && wc -l data/processed/prices_with_vol.csv
@echo "Sample:" && head -n 5 data/processed/prices_with_vol.csv
Run it from the terminal:
make qa5.2 Explanation
This QA target demonstrates key Makefile concepts:
.PHONYdeclaresqaas a command, not a file.
wc -lcounts rows in the processed CSV (quick validation).
head -n 5previews file structure and dataset formatting.
- The leading
@suppresses the command echo for cleaner output.
Make allows you to bundle commonly repeated actions into simple targets to improve efficiency and consistency.
5.3 A Full Project Makefile
Below is an example Makefile that reflects a typical data-science workflow used in this course:
.PHONY: env data db features book test clean
env:
pip install -r requirements.txt
data:
python scripts/make_synth_data.py
db:
python scripts/make_sqlite.py
features:
python scripts/build_features.py
book:
quarto render book
test:
pytest -q
clean:
rm -rf db/*.db data/processed/* book/_site book/_freeze
5.4 What Each Target Does
5.4.1 env
Installs all required Python dependencies based on requirements.txt.
Ensures anyone cloning your repo can rebuild your environment in one command.
5.4.2 data
Generates synthetic raw data used throughout the book.
Running this target guarantees consistent input files.
5.4.3 db
Builds the SQLite database from processed CSVs.
Allows SQL queries in your pipeline to operate on a reproducible dataset.
5.4.4 features
Constructs engineered features such as log r