7  Pandas Joins

This chapter demonstrates how to merge two DataFrames in pandas without requiring PyArrow or Parquet.
This avoids dependency issues and keeps the example lightweight and reproducible.

7.1 7.1 Creating Two Tables in Pandas

We construct two small DataFrames:

  • prices: daily stock prices
  • meta: simple metadata for each ticker
import pandas as pd

# Price table
prices = pd.DataFrame({
    "ticker": ["AAPL", "AAPL", "MSFT", "MSFT"],
    "date": ["2023-01-01", "2023-01-02", "2023-01-01", "2023-01-02"],
    "adj_close": [125.0, 126.5, 240.0, 242.3]
})

# Metadata table
meta = pd.DataFrame({
    "ticker": ["AAPL", "MSFT"],
    "sector": ["InfoTech", "InfoTech"]
})

prices, meta

7.2 7.2 Merging the Tables

We perform a left join so every row in prices receives the matching metadata.

df = prices.merge(meta, on="ticker", how="left")
df

7.3 Explanation

  • merge() works like SQL joins but inside pandas
  • on="ticker" tells pandas which column to match
  • how="left" keeps all price rows, adding sector info where available

This pattern mirrors the SQL join you built in the previous chapter but stays purely within pandas.