5  Makefile Essentials

This chapter introduces Make, a lightweight automation tool used to define and run repeatable tasks.
Makefiles help streamline workflows by turning multi-step processes into simple, declarative commands such as make data or make book. This approach improves reproducibility, reduces manual errors, and keeps your project organized.


5.1 A Simple QA (Quality Assurance) Target

This example Makefile target performs a quick data check by reporting the number of rows in a processed CSV and previewing its contents.

.PHONY: qa
qa:
    @echo "Rows:" && wc -l data/processed/prices_with_vol.csv
    @echo "Sample:" && head -n 5 data/processed/prices_with_vol.csv

Run it from the terminal:

make qa

5.2 Explanation

This QA target demonstrates key Makefile concepts:

  • .PHONY declares qa as a command, not a file.
  • wc -l counts rows in the processed CSV (quick validation).
  • head -n 5 previews file structure and dataset formatting.
  • The leading @ suppresses the command echo for cleaner output.

Make allows you to bundle commonly repeated actions into simple targets to improve efficiency and consistency.


5.3 A Full Project Makefile

Below is an example Makefile that reflects a typical data-science workflow used in this course:

.PHONY: env data db features book test clean

env:
    pip install -r requirements.txt

data:
    python scripts/make_synth_data.py

db:
    python scripts/make_sqlite.py

features:
    python scripts/build_features.py

book:
    quarto render book

test:
    pytest -q

clean:
    rm -rf db/*.db data/processed/* book/_site book/_freeze

5.4 What Each Target Does

5.4.1 env

Installs all required Python dependencies based on requirements.txt.
Ensures anyone cloning your repo can rebuild your environment in one command.

5.4.2 data

Generates synthetic raw data used throughout the book.
Running this target guarantees consistent input files.

5.4.3 db

Builds the SQLite database from processed CSVs.
Allows SQL queries in your pipeline to operate on a reproducible dataset.

5.4.4 features

Constructs engineered features such as log r