Disaggregation¶
How to expand typical-period results back to the original time series length.
Use case: You aggregate a year of hourly data into 8 typical days, run an optimization on those 8 days, and then need the results mapped back to all 365 days.
disaggregate() does exactly this — it takes any DataFrame with the same (cluster, timestep) structure as cluster_representatives and expands it using the stored cluster assignments.
import pandas as pd
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook_connected"
import tsam
from tsam import ClusteringResult, SegmentConfig
raw = pd.read_csv("testdata.csv", index_col=0, parse_dates=True)
import warnings
# Added to every example notebook: silence the v3 column-order
# FutureWarning in the rendered docs (tsam v4 returns result columns in
# input order; see migration guide).
warnings.filterwarnings(
"ignore", category=FutureWarning, message=".*sorted alphabetically.*"
)
Basic Disaggregation¶
Aggregate, then disaggregate the typical periods back to the full year. The result matches .reconstructed exactly.
result = tsam.aggregate(raw, n_clusters=8)
print(f"Original: {result.original.shape}")
print(f"Cluster representatives: {result.cluster_representatives.shape}")
expanded = result.disaggregate(result.cluster_representatives)
print(f"Disaggregated: {expanded.shape}")
print(f"Matches .reconstructed: {expanded.equals(result.reconstructed)}")
Original: (8760, 4) Cluster representatives: (192, 4) Disaggregated: (8760, 4) Matches .reconstructed: True
Disaggregating Arbitrary Data¶
The real value: disaggregate data that tsam didn't produce. Here we simulate optimization results — a "dispatch" column computed from the typical periods — and expand it back to the full year.
# Simulate optimization: compute "dispatch" as a function of the typical periods
reps = result.cluster_representatives
dispatch = pd.DataFrame(
{"Dispatch": reps["Load"] - 0.5 * reps["GHI"] - 0.3 * reps["Wind"]},
index=reps.index,
)
print(f"Dispatch (typical periods): {dispatch.shape}")
dispatch.head()
Dispatch (typical periods): (192, 1)
| Dispatch | ||
|---|---|---|
| timestep | ||
| 0 | 0 | 403.699541 |
| 1 | 394.070072 | |
| 2 | 390.064483 | |
| 3 | 391.681888 | |
| 4 | 397.607784 |
# Disaggregate back to full year
full_year_dispatch = result.disaggregate(dispatch)
print(f"Full year dispatch: {full_year_dispatch.shape}")
fig = px.line(full_year_dispatch, labels={"index": "Time", "value": "Dispatch"})
fig.update_layout(title="Disaggregated Dispatch Over Full Year", showlegend=False)
fig.show()
Full year dispatch: (8760, 1)
Survives IO¶
Save the clustering to JSON, load it later, and disaggregate without the original AggregationResult.
# Save clustering
result.clustering.to_json("clustering.json")
# Later: load and disaggregate
clustering = ClusteringResult.from_json("clustering.json")
full_year_from_disk = clustering.disaggregate(dispatch)
print(f"Shape: {full_year_from_disk.shape}")
print(
f"Matches original disaggregation: {full_year_dispatch.values[:8760].tolist() == full_year_from_disk.values.tolist()}"
)
Shape: (8760, 1) Matches original disaggregation: True
Note: ClusteringResult.disaggregate() returns an integer-indexed DataFrame (it doesn't have access to the original datetime index). AggregationResult.disaggregate() restores the datetime index automatically.
Segmented Data¶
With segmentation, cluster_representatives has a (cluster, segment, duration) index. Disaggregation expands segments to full timesteps, placing values at the start of each segment and NaN elsewhere. Use .ffill() for a step function.
result_seg = tsam.aggregate(raw, n_clusters=8, segments=SegmentConfig(n_segments=4))
print(f"Cluster representatives: {result_seg.cluster_representatives.shape}")
print(f"Index levels: {result_seg.cluster_representatives.index.names}")
result_seg.cluster_representatives.head(8)
Cluster representatives: (32, 4) Index levels: [None, 'Segment Step', 'Segment Duration']
| GHI | Load | T | Wind | |||
|---|---|---|---|---|---|---|
| Segment Step | Segment Duration | |||||
| 0 | 0 | 7 | 1.398388 | 407.375463 | -1.054687 | 2.666249 |
| 1 | 4 | 212.904572 | 498.638951 | 0.907119 | 1.166484 | |
| 2 | 4 | 321.668082 | 518.550554 | 3.898873 | 3.207830 | |
| 3 | 9 | 19.214887 | 482.436595 | 0.754534 | 2.682913 | |
| 1 | 0 | 6 | 7.432173 | 364.225427 | 10.389180 | 2.332967 |
| 1 | 9 | 205.079641 | 490.185234 | 10.171202 | 2.592186 | |
| 2 | 5 | 86.575754 | 481.248942 | 9.911808 | 2.566264 | |
| 3 | 4 | 0.000000 | 427.301773 | 10.323787 | 2.332967 |
expanded_seg = result_seg.disaggregate(result_seg.cluster_representatives)
print(f"Disaggregated shape: {expanded_seg.shape}")
print(f"NaN count: {expanded_seg.isna().sum().sum()} (segment gaps)")
print(f"Non-NaN count: {expanded_seg.notna().sum().sum()} (segment starts)")
# Show a single day: values at segment starts, NaN in between
expanded_seg["GHI"].iloc[:24]
Disaggregated shape: (8760, 4) NaN count: 29200 (segment gaps) Non-NaN count: 5840 (segment starts)
2009-12-31 23:30:00 0.000000 2010-01-01 00:30:00 NaN 2010-01-01 01:30:00 NaN 2010-01-01 02:30:00 NaN 2010-01-01 03:30:00 NaN 2010-01-01 04:30:00 NaN 2010-01-01 05:30:00 63.517890 2010-01-01 06:30:00 NaN 2010-01-01 07:30:00 NaN 2010-01-01 08:30:00 NaN 2010-01-01 09:30:00 NaN 2010-01-01 10:30:00 279.522222 2010-01-01 11:30:00 NaN 2010-01-01 12:30:00 NaN 2010-01-01 13:30:00 22.405283 2010-01-01 14:30:00 NaN 2010-01-01 15:30:00 NaN 2010-01-01 16:30:00 NaN 2010-01-01 17:30:00 NaN 2010-01-01 18:30:00 NaN 2010-01-01 19:30:00 NaN 2010-01-01 20:30:00 NaN 2010-01-01 21:30:00 NaN 2010-01-01 22:30:00 NaN Name: GHI, dtype: float64
# Forward-fill for a step function
filled = expanded_seg.ffill()
fig = px.line(
pd.DataFrame(
{"Original": result_seg.original["GHI"], "Disaggregated (ffill)": filled["GHI"]}
),
labels={"index": "Time", "value": "GHI"},
).update_layout(
title="Segmented Disaggregation: Original vs Reconstructed (first 2 weeks)"
)
fig.update_xaxes(range=[raw.index[0], raw.index[24 * 14]])
fig.show()
Summary¶
# Aggregate
result = tsam.aggregate(df, n_clusters=8)
# Run optimization on typical periods
optimized = my_optimizer(result.cluster_representatives)
# Expand back to full year (with datetime index)
full_year = result.disaggregate(optimized)
# Or via saved clustering (integer index)
clustering = ClusteringResult.from_json("clustering.json")
full_year = clustering.disaggregate(optimized)
# For segmented data: .ffill() gives a step function
full_year = result.disaggregate(segmented_data).ffill()