10  PyTorch Mini Training Loop

This chapter demonstrates a minimal PyTorch workflow: creating a synthetic dataset, wrapping it in a DataLoader, defining a small GRU-based model, and running a short training loop.
This example captures the core ideas introduced in Session 19 — tensors, datasets, batching, models, optimizers, and loss computation.


10.1 Synthetic Data + GRU Model

import torch
import torch.nn as nn
import numpy as np

# Reproducibility
rng = np.random.default_rng(0)

# Tiny synthetic dataset dimensions
T, F, N = 16, 3, 512    # time steps, features, samples

# Generate random sequences (N, T, F)
X = rng.normal(0, 0.5, size=(N, T, F)).astype("float32")

# Simple linear rule for next-step "return"
w = rng.normal(0, 0.2, size=(F,)).astype("float32")
y = (X[:, -1, :] @ w + rng.normal(0, 0.1, size=(N,))).astype("float32")

# Wrap into PyTorch Dataset + DataLoader
ds = torch.utils.data.TensorDataset(
    torch.from_numpy(X),
    torch.from_numpy(y)
)

dl = torch.utils.data.DataLoader(ds, batch_size=64, shuffle=True)


# GRU-based regressor
class SmallGRU(nn.Module):
    def __init__(self, F, H=32):
        super().__init__()
        self.gru = nn.GRU(F, H, batch_first=True)
        self.head = nn.Linear(H, 1)

    def forward(self, x):
        _, h = self.gru(x)
        return self.head(h[-1]).squeeze(-1)


# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

net = SmallGRU(F).to(device)
opt = torch.optim.AdamW(net.parameters(), lr=3e-3)
loss_fn = nn.L1Loss()

# Training loop
loss_history = []

for ep in range(1, 11):
    net.train()
    tot = 0

    for xb, yb in dl:
        xb, yb = xb.to(device), yb.to(device)
        opt.zero_grad(set_to_none=True)

        yhat = net(xb)
        loss = loss_fn(yhat, yb)

        loss.backward()
        nn.utils.clip_grad_norm_(net.parameters(), 1.0)
        opt.step()

        tot += loss.item() * xb.size(0)

    epoch_mae = tot / len(ds)
    loss_history.append(epoch_mae)
    print(f"epoch {ep:02d} | train_mae={epoch_mae:.4f}")

loss_history

10.2 Explanation

This example captures several key concepts from the PyTorch training workflow:

  • Synthetic dataset creation shows how PyTorch works even without real data.
  • TensorDataset & DataLoader handle batching, shuffling, and iteration.
  • GRU model processes sequence data and returns an encoded representation.
  • Optimizer (AdamW) updates weights based on gradients.
  • L1Loss measures prediction error.
  • Training loop performs forward pass → loss → backward pass → gradient step.

This minimal model is not meant for high performance—it simply illustrates how modern deep learning frameworks structure data pipelines and training logic.