7 Pandas Joins
This chapter demonstrates how to merge two DataFrames in pandas without requiring PyArrow or Parquet.
This avoids dependency issues and keeps the example lightweight and reproducible.
7.1 7.1 Creating Two Tables in Pandas
We construct two small DataFrames:
prices: daily stock prices
meta: simple metadata for each ticker
import pandas as pd
# Price table
prices = pd.DataFrame({
"ticker": ["AAPL", "AAPL", "MSFT", "MSFT"],
"date": ["2023-01-01", "2023-01-02", "2023-01-01", "2023-01-02"],
"adj_close": [125.0, 126.5, 240.0, 242.3]
})
# Metadata table
meta = pd.DataFrame({
"ticker": ["AAPL", "MSFT"],
"sector": ["InfoTech", "InfoTech"]
})
prices, meta7.2 7.2 Merging the Tables
We perform a left join so every row in prices receives the matching metadata.
df = prices.merge(meta, on="ticker", how="left")
df7.3 Explanation
merge()works like SQL joins but inside pandas
on="ticker"tells pandas which column to match
how="left"keeps all price rows, adding sector info where available
This pattern mirrors the SQL join you built in the previous chapter but stays purely within pandas.