Disaggregation¶

How to expand typical-period results back to the original time series length.

Use case: You aggregate a year of hourly data into 8 typical days, run an optimization on those 8 days, and then need the results mapped back to all 365 days.

disaggregate() does exactly this — it takes any DataFrame with the same (cluster, timestep) structure as cluster_representatives and expands it using the stored cluster assignments.

In [1]:

Copied!





import pandas as pd
import plotly.express as px
import plotly.io as pio

pio.renderers.default = "notebook_connected"

import tsam
from tsam import ClusteringResult, SegmentConfig

raw = pd.read_csv("testdata.csv", index_col=0, parse_dates=True)
import warnings

# Added to every example notebook: silence the v3 column-order
# FutureWarning in the rendered docs (tsam v4 returns result columns in
# input order; see migration guide).
warnings.filterwarnings(
    "ignore", category=FutureWarning, message=".*sorted alphabetically.*"
)
import pandas as pd
import plotly.express as px
import plotly.io as pio

pio.renderers.default = "notebook_connected"

import tsam
from tsam import ClusteringResult, SegmentConfig

raw = pd.read_csv("testdata.csv", index_col=0, parse_dates=True)
import warnings

# Added to every example notebook: silence the v3 column-order
# FutureWarning in the rendered docs (tsam v4 returns result columns in
# input order; see migration guide).
warnings.filterwarnings(
    "ignore", category=FutureWarning, message=".*sorted alphabetically.*"
)

Basic Disaggregation¶

Aggregate, then disaggregate the typical periods back to the full year. The result matches .reconstructed exactly.

In [2]:

Copied!





result = tsam.aggregate(raw, n_clusters=8)

print(f"Original:               {result.original.shape}")
print(f"Cluster representatives: {result.cluster_representatives.shape}")

expanded = result.disaggregate(result.cluster_representatives)
print(f"Disaggregated:          {expanded.shape}")
print(f"Matches .reconstructed: {expanded.equals(result.reconstructed)}")
result = tsam.aggregate(raw, n_clusters=8)

print(f"Original:               {result.original.shape}")
print(f"Cluster representatives: {result.cluster_representatives.shape}")

expanded = result.disaggregate(result.cluster_representatives)
print(f"Disaggregated:          {expanded.shape}")
print(f"Matches .reconstructed: {expanded.equals(result.reconstructed)}")

Original:               (8760, 4)
Cluster representatives: (192, 4)
Disaggregated:          (8760, 4)
Matches .reconstructed: True

Disaggregating Arbitrary Data¶

The real value: disaggregate data that tsam didn't produce. Here we simulate optimization results — a "dispatch" column computed from the typical periods — and expand it back to the full year.

In [3]:

Copied!





# Simulate optimization: compute "dispatch" as a function of the typical periods
reps = result.cluster_representatives
dispatch = pd.DataFrame(
    {"Dispatch": reps["Load"] - 0.5 * reps["GHI"] - 0.3 * reps["Wind"]},
    index=reps.index,
)

print(f"Dispatch (typical periods): {dispatch.shape}")
dispatch.head()
# Simulate optimization: compute "dispatch" as a function of the typical periods
reps = result.cluster_representatives
dispatch = pd.DataFrame(
    {"Dispatch": reps["Load"] - 0.5 * reps["GHI"] - 0.3 * reps["Wind"]},
    index=reps.index,
)

print(f"Dispatch (typical periods): {dispatch.shape}")
dispatch.head()

Dispatch (typical periods): (192, 1)

Out[3]:

		Dispatch
	timestep
0	0	403.699541
	1	394.070072
	2	390.064483
	3	391.681888
	4	397.607784

In [4]:

Copied!





# Disaggregate back to full year
full_year_dispatch = result.disaggregate(dispatch)

print(f"Full year dispatch: {full_year_dispatch.shape}")

fig = px.line(full_year_dispatch, labels={"index": "Time", "value": "Dispatch"})
fig.update_layout(title="Disaggregated Dispatch Over Full Year", showlegend=False)
fig.show()
# Disaggregate back to full year
full_year_dispatch = result.disaggregate(dispatch)

print(f"Full year dispatch: {full_year_dispatch.shape}")

fig = px.line(full_year_dispatch, labels={"index": "Time", "value": "Dispatch"})
fig.update_layout(title="Disaggregated Dispatch Over Full Year", showlegend=False)
fig.show()

Full year dispatch: (8760, 1)

Survives IO¶

Save the clustering to JSON, load it later, and disaggregate without the original AggregationResult.

In [5]:

Copied!





# Save clustering
result.clustering.to_json("clustering.json")

# Later: load and disaggregate
clustering = ClusteringResult.from_json("clustering.json")
full_year_from_disk = clustering.disaggregate(dispatch)

print(f"Shape: {full_year_from_disk.shape}")
print(
    f"Matches original disaggregation: {full_year_dispatch.values[:8760].tolist() == full_year_from_disk.values.tolist()}"
)
# Save clustering
result.clustering.to_json("clustering.json")

# Later: load and disaggregate
clustering = ClusteringResult.from_json("clustering.json")
full_year_from_disk = clustering.disaggregate(dispatch)

print(f"Shape: {full_year_from_disk.shape}")
print(
    f"Matches original disaggregation: {full_year_dispatch.values[:8760].tolist() == full_year_from_disk.values.tolist()}"
)

Shape: (8760, 1)
Matches original disaggregation: True

Note: ClusteringResult.disaggregate() returns an integer-indexed DataFrame (it doesn't have access to the original datetime index). AggregationResult.disaggregate() restores the datetime index automatically.

Segmented Data¶

With segmentation, cluster_representatives has a (cluster, segment, duration) index. Disaggregation expands segments to full timesteps, placing values at the start of each segment and NaN elsewhere. Use .ffill() for a step function.

In [6]:

Copied!

result_seg = tsam.aggregate(raw, n_clusters=8, segments=SegmentConfig(n_segments=4))

print(f"Cluster representatives: {result_seg.cluster_representatives.shape}")
print(f"Index levels: {result_seg.cluster_representatives.index.names}")
result_seg.cluster_representatives.head(8)
result_seg = tsam.aggregate(raw, n_clusters=8, segments=SegmentConfig(n_segments=4))

print(f"Cluster representatives: {result_seg.cluster_representatives.shape}")
print(f"Index levels: {result_seg.cluster_representatives.index.names}")
result_seg.cluster_representatives.head(8)

Cluster representatives: (32, 4)
Index levels: [None, 'Segment Step', 'Segment Duration']

Out[6]:

			GHI	Load	T	Wind
	Segment Step	Segment Duration
0	0	7	1.398388	407.375463	-1.054687	2.666249
	1	4	212.904572	498.638951	0.907119	1.166484
	2	4	321.668082	518.550554	3.898873	3.207830
	3	9	19.214887	482.436595	0.754534	2.682913
1	0	6	7.432173	364.225427	10.389180	2.332967
	1	9	205.079641	490.185234	10.171202	2.592186
	2	5	86.575754	481.248942	9.911808	2.566264
	3	4	0.000000	427.301773	10.323787	2.332967

In [7]:

Copied!





expanded_seg = result_seg.disaggregate(result_seg.cluster_representatives)

print(f"Disaggregated shape: {expanded_seg.shape}")
print(f"NaN count: {expanded_seg.isna().sum().sum()} (segment gaps)")
print(f"Non-NaN count: {expanded_seg.notna().sum().sum()} (segment starts)")

# Show a single day: values at segment starts, NaN in between
expanded_seg["GHI"].iloc[:24]
expanded_seg = result_seg.disaggregate(result_seg.cluster_representatives)

print(f"Disaggregated shape: {expanded_seg.shape}")
print(f"NaN count: {expanded_seg.isna().sum().sum()} (segment gaps)")
print(f"Non-NaN count: {expanded_seg.notna().sum().sum()} (segment starts)")

# Show a single day: values at segment starts, NaN in between
expanded_seg["GHI"].iloc[:24]

Disaggregated shape: (8760, 4)
NaN count: 29200 (segment gaps)
Non-NaN count: 5840 (segment starts)

Out[7]:

2009-12-31 23:30:00      0.000000
2010-01-01 00:30:00           NaN
2010-01-01 01:30:00           NaN
2010-01-01 02:30:00           NaN
2010-01-01 03:30:00           NaN
2010-01-01 04:30:00           NaN
2010-01-01 05:30:00     63.517890
2010-01-01 06:30:00           NaN
2010-01-01 07:30:00           NaN
2010-01-01 08:30:00           NaN
2010-01-01 09:30:00           NaN
2010-01-01 10:30:00    279.522222
2010-01-01 11:30:00           NaN
2010-01-01 12:30:00           NaN
2010-01-01 13:30:00     22.405283
2010-01-01 14:30:00           NaN
2010-01-01 15:30:00           NaN
2010-01-01 16:30:00           NaN
2010-01-01 17:30:00           NaN
2010-01-01 18:30:00           NaN
2010-01-01 19:30:00           NaN
2010-01-01 20:30:00           NaN
2010-01-01 21:30:00           NaN
2010-01-01 22:30:00           NaN
Name: GHI, dtype: float64

In [8]:

Copied!





# Forward-fill for a step function
filled = expanded_seg.ffill()

fig = px.line(
    pd.DataFrame(
        {"Original": result_seg.original["GHI"], "Disaggregated (ffill)": filled["GHI"]}
    ),
    labels={"index": "Time", "value": "GHI"},
).update_layout(
    title="Segmented Disaggregation: Original vs Reconstructed (first 2 weeks)"
)
fig.update_xaxes(range=[raw.index[0], raw.index[24 * 14]])
fig.show()
# Forward-fill for a step function
filled = expanded_seg.ffill()

fig = px.line(
    pd.DataFrame(
        {"Original": result_seg.original["GHI"], "Disaggregated (ffill)": filled["GHI"]}
    ),
    labels={"index": "Time", "value": "GHI"},
).update_layout(
    title="Segmented Disaggregation: Original vs Reconstructed (first 2 weeks)"
)
fig.update_xaxes(range=[raw.index[0], raw.index[24 * 14]])
fig.show()

Summary¶

# Aggregate
result = tsam.aggregate(df, n_clusters=8)

# Run optimization on typical periods
optimized = my_optimizer(result.cluster_representatives)

# Expand back to full year (with datetime index)
full_year = result.disaggregate(optimized)

# Or via saved clustering (integer index)
clustering = ClusteringResult.from_json("clustering.json")
full_year = clustering.disaggregate(optimized)

# For segmented data: .ffill() gives a step function
full_year = result.disaggregate(segmented_data).ffill()