Visualization & Quality Analysis¶

This notebook demonstrates how to analyze the quality of time series aggregation using tsam's built-in plotting tools.

Quick Reference¶

For heatmaps — use tsam.unstack_to_periods() with plotly:

unstacked = tsam.unstack_to_periods(df, period_duration=24)
px.imshow(unstacked["Load"].values.T, labels={"x": "Day", "y": "Hour", "color": "Load"})

Accessor methods (result.plot.*) — for validation after aggregation:

compare(columns, mode) — Original vs reconstructed ("overlay", "side_by_side", "duration_curve")
residuals(columns, mode) — Error analysis ("time_series", "histogram", "by_period", "by_timestep")
cluster_weights() — Bar chart of cluster sizes
cluster_representatives(columns) — Line plots of typical periods
cluster_members(columns, clusters, slider) — All original periods per cluster with representative highlighted
accuracy() — Bar chart of RMSE / MAE / RMSE (Duration) metrics
segment_durations() — Bar chart of segment lengths (requires segmentation)

Data properties (result.*) — for direct access:

result.original — Original DataFrame
result.reconstructed — Reconstructed DataFrame (cached)
result.residuals — Difference: original − reconstructed
result.cluster_assignments — Array of cluster indices per period

Table of Contents¶

Load and aggregate data
Visual comparison (heatmaps, duration curves, line plots)
Cluster analysis (weights, representatives)
Error analysis (accuracy metrics, residuals)
Comparing aggregation configurations
Extreme period preservation
Segmentation analysis

In [1]:

Copied!





import pandas as pd
import plotly.express as px
import plotly.io as pio

import tsam
from tsam import ClusterConfig, ExtremeConfig, SegmentConfig

pio.renderers.default = "notebook_connected"
import warnings

# Added to every example notebook: silence the v3 column-order
# FutureWarning in the rendered docs (tsam v4 returns result columns in
# input order; see migration guide).
warnings.filterwarnings(
    "ignore", category=FutureWarning, message=".*sorted alphabetically.*"
)
import pandas as pd
import plotly.express as px
import plotly.io as pio

import tsam
from tsam import ClusterConfig, ExtremeConfig, SegmentConfig

pio.renderers.default = "notebook_connected"
import warnings

# Added to every example notebook: silence the v3 column-order
# FutureWarning in the rendered docs (tsam v4 returns result columns in
# input order; see migration guide).
warnings.filterwarnings(
    "ignore", category=FutureWarning, message=".*sorted alphabetically.*"
)

1. Load Data and Run Aggregation¶

In [2]:

Copied!





# Load test data (8760 hours = 1 year of hourly data)
raw = pd.read_csv("testdata.csv", index_col=0)
print(f"Data shape: {raw.shape}")
print(f"Columns: {list(raw.columns)}")
raw.head()
# Load test data (8760 hours = 1 year of hourly data)
raw = pd.read_csv("testdata.csv", index_col=0)
print(f"Data shape: {raw.shape}")
print(f"Columns: {list(raw.columns)}")
raw.head()

Data shape: (8760, 4)
Columns: ['GHI', 'T', 'Wind', 'Load']

Out[2]:

	T	Wind	Load
2009-12-31 23:30:00	-2.1	7.1	375.478394
2010-01-01 00:30:00	-2.8	8.6	364.541326
2010-01-01 01:30:00	-3.3	9.7	357.416844
2010-01-01 02:30:00	-3.2	9.8	350.191306
2010-01-01 03:30:00	-3.2	9.4	345.161449

In [3]:

Copied!





# Run aggregation with 12 typical days
result = tsam.aggregate(
    raw,
    n_clusters=12,
    period_duration=24,
    cluster=ClusterConfig(method="hierarchical"),
)

print(f"Number of clusters: {result.n_clusters}")
print(f"Timesteps per period: {result.n_timesteps_per_period}")
print(f"Total original periods: {len(raw) // result.n_timesteps_per_period}")
# Run aggregation with 12 typical days
result = tsam.aggregate(
    raw,
    n_clusters=12,
    period_duration=24,
    cluster=ClusterConfig(method="hierarchical"),
)

print(f"Number of clusters: {result.n_clusters}")
print(f"Timesteps per period: {result.n_timesteps_per_period}")
print(f"Total original periods: {len(raw) // result.n_timesteps_per_period}")

Number of clusters: 12
Timesteps per period: 24
Total original periods: 365

2. Visual Comparison: Original vs Reconstructed¶

Both the original and the reconstructed data can be visualised as heatmaps or line plots, either of the full time series or of the duration curves.

2.1 Heatmaps¶

These plots show the full year, with periods (days) on the x-axis and timesteps (hours) on the y-axis. The data must be preprocessed using the tsam.unstack_to_periods() method to create these plots.

2.1.1 Plot the Time Series From the Original Data Frame¶

In [4]:

Copied!





# Reshape raw data for heatmap visualization
unstacked = tsam.unstack_to_periods(raw, period_duration=24)

# Create heatmap with plotly express
px.imshow(
    unstacked["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Original Temperature",
    aspect="auto",
)
# Reshape raw data for heatmap visualization
unstacked = tsam.unstack_to_periods(raw, period_duration=24)

# Create heatmap with plotly express
px.imshow(
    unstacked["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Original Temperature",
    aspect="auto",
)

2.1.2 Plot the Time Series From the Aggregation Result Object¶

The original, unaltered time series can also be accessed from the results.

In [5]:

Copied!





# Original data heatmap using result.original
unstacked_orig = tsam.unstack_to_periods(result.original, period_duration=24)
px.imshow(
    unstacked_orig["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Original Temperature (from result)",
    aspect="auto",
)
# Original data heatmap using result.original
unstacked_orig = tsam.unstack_to_periods(result.original, period_duration=24)
px.imshow(
    unstacked_orig["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Original Temperature (from result)",
    aspect="auto",
)

2.1.3 Plot the Time Series From the Aggregation Result Object¶

The results contain the reconstructed time series from the aggregated typical periods. As can be seen, the reconstructed time series deviates slightly from the original time series.

In [6]:

Copied!





# Reconstructed data heatmap using result.reconstructed
unstacked_recon = tsam.unstack_to_periods(result.reconstructed, period_duration=24)
px.imshow(
    unstacked_recon["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Reconstructed Temperature",
    aspect="auto",
)
# Reconstructed data heatmap using result.reconstructed
unstacked_recon = tsam.unstack_to_periods(result.reconstructed, period_duration=24)
px.imshow(
    unstacked_recon["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Reconstructed Temperature",
    aspect="auto",
)

2.1.4 Create a Multi Column Plot¶

It is also possible to create a multi-column plot for closely comparing various time series. This is demonstrated here using multiple time series that have been reconstructed.

In [7]:

Copied!





# Multi-column heatmaps of reconstructed data
for col in ["GHI", "T", "Load"]:
    px.imshow(
        unstacked_recon[col].values.T,
        labels={"x": "Day", "y": "Hour", "color": col},
        title=f"Reconstructed {col}",
        aspect="auto",
    ).show()
# Multi-column heatmaps of reconstructed data
for col in ["GHI", "T", "Load"]:
    px.imshow(
        unstacked_recon[col].values.T,
        labels={"x": "Day", "y": "Hour", "color": col},
        title=f"Reconstructed {col}",
        aspect="auto",
    ).show()

2.1.5 Compare Original and Reconstructed Time Series in a Multicolumn Plot¶

In [8]:

Copied!





# Compare original vs reconstructed for specific columns
for col in ["T", "Load"]:
    fig_orig = px.imshow(
        unstacked_orig[col].values.T,
        labels={"x": "Day", "y": "Hour", "color": col},
        title=f"Original {col}",
        aspect="auto",
    )
    fig_orig.show()
    fig_recon = px.imshow(
        unstacked_recon[col].values.T,
        labels={"x": "Day", "y": "Hour", "color": col},
        title=f"Reconstructed {col}",
        aspect="auto",
    )
    fig_recon.show()
# Compare original vs reconstructed for specific columns
for col in ["T", "Load"]:
    fig_orig = px.imshow(
        unstacked_orig[col].values.T,
        labels={"x": "Day", "y": "Hour", "color": col},
        title=f"Original {col}",
        aspect="auto",
    )
    fig_orig.show()
    fig_recon = px.imshow(
        unstacked_recon[col].values.T,
        labels={"x": "Day", "y": "Hour", "color": col},
        title=f"Reconstructed {col}",
        aspect="auto",
    )
    fig_recon.show()

2.2 Duration Curves¶

Duration curves are created by sorting a time series in descending order. These curves are an important tool for determining how well the aggregation preserves the value distribution.

The most convenient way to analyse the duration curves of the original and reconstructed time series is to use the result.plot.compare() method. It is also demonstrated how to analyse the original and reconstructed duration curves.

2.2.1 Compare Original and Reconstructed Duration Curve¶

In [9]:

Copied!

# Accessor: Compare original vs reconstructed duration curves
result.plot.compare(mode="duration_curve")
# Accessor: Compare original vs reconstructed duration curves
result.plot.compare(mode="duration_curve")

2.2.2 Plot Duration Cruves of the Orignal¶

In [10]:

Copied!





# Duration curve with plotly express (raw data)
frames = []
for col in ["Load", "GHI"]:
    sorted_vals = raw[col].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Value": sorted_vals, "Column": col}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(long_df, x="Hour", y="Value", color="Column", title="Original Duration Curves")
# Duration curve with plotly express (raw data)
frames = []
for col in ["Load", "GHI"]:
    sorted_vals = raw[col].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Value": sorted_vals, "Column": col}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(long_df, x="Hour", y="Value", color="Column", title="Original Duration Curves")

2.2.3 Plot Reconstructed Duration Curves¶

In [11]:

Copied!





# Duration curves for reconstructed data with plotly express
frames = []
for col in result.reconstructed.columns:
    sorted_vals = (
        result.reconstructed[col].sort_values(ascending=False).reset_index(drop=True)
    )
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Value": sorted_vals, "Column": col}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df, x="Hour", y="Value", color="Column", title="Reconstructed Duration Curves"
)
# Duration curves for reconstructed data with plotly express
frames = []
for col in result.reconstructed.columns:
    sorted_vals = (
        result.reconstructed[col].sort_values(ascending=False).reset_index(drop=True)
    )
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Value": sorted_vals, "Column": col}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df, x="Hour", y="Value", color="Column", title="Reconstructed Duration Curves"
)

2.3 Line Plots¶

Compare original vs reconstructed as line plots. Use plotly's interactive zoom/pan to explore specific time periods. The Overlay can be

2.3.1 Compare by Overlay¶

In [12]:

Copied!





# Accessor: Compare overlay mode (same color per column, dash differentiates Original/Reconstructed)
# Use plotly's interactive zoom to explore specific time ranges
result.plot.compare(
    columns=["T", "Load"],
    mode="overlay",
    title="Temperature and Load Comparison (use zoom to explore)",
)
# Accessor: Compare overlay mode (same color per column, dash differentiates Original/Reconstructed)
# Use plotly's interactive zoom to explore specific time ranges
result.plot.compare(
    columns=["T", "Load"],
    mode="overlay",
    title="Temperature and Load Comparison (use zoom to explore)",
)

2.3.2 Compare Side by Side¶

In [13]:

Copied!





# Accessor: Compare side-by-side mode
result.plot.compare(
    columns=["GHI"],
    mode="side_by_side",
    title="Solar Irradiance Comparison (side_by_side)",
)
# Accessor: Compare side-by-side mode
result.plot.compare(
    columns=["GHI"],
    mode="side_by_side",
    title="Solar Irradiance Comparison (side_by_side)",
)

3. Cluster Analysis¶

Understanding the cluster structure is key to assessing whether the aggregation captures meaningful patterns. Looking at the cluster weights and representatives can be helpful. Cluster weights show how many original periods are represented by a cluster. A cluster representative is the time series that represents all the periods in a cluster.

3.1 Cluster Weights¶

The plot shows how many days are represented by each typical day. For example, 29 time series are represented by cluster 0, 63 time series by cluster 63, and so on.

In [14]:

Copied!

result.plot.cluster_weights()
result.plot.cluster_weights()

In [15]:

Copied!





# Cluster assignments - which cluster each original day belongs to
print("Cluster assignments (first 30 days):")
print(result.cluster_assignments[:30])
print(f"\nTotal periods: {len(result.cluster_assignments)}")
# Cluster assignments - which cluster each original day belongs to
print("Cluster assignments (first 30 days):")
print(result.cluster_assignments[:30])
print(f"\nTotal periods: {len(result.cluster_assignments)}")

Cluster assignments (first 30 days):
[ 5 10  3  7  7  5  2  3  3  2  2  5  5  2  2  2  2  2  8  8  2  2  2  2
  2  5  8  2  2  2]

Total periods: 365

3.2 Cluster Representatives¶

Each cluster representative is the single period (day) chosen to represent all periods assigned to that cluster. The plot below shows these representative profiles — one line per cluster, with the legend indicating how many original periods each cluster contains (e.g. "Period 2 (n=45)" means cluster 2 represents 45 original days).

Each column of the input data is plotted separately so you can see the representative profile for each variable independently.

In [16]:

Copied!

# Representative profiles for temperature
result.plot.cluster_representatives(columns=["T"])
# Representative profiles for temperature
result.plot.cluster_representatives(columns=["T"])

In [17]:

Copied!

# Representative profiles for solar irradiance
result.plot.cluster_representatives(columns=["GHI"])
# Representative profiles for solar irradiance
result.plot.cluster_representatives(columns=["GHI"])

4. Error Analysis¶

To investigate the quality of the aggregation, it is useful to look at different accuracy measures, residuals and the absolute mean error per timestep.

4.1 Accuracy Metrics¶

Three accuracy metrics are calculated automatically for each column (on normalized 0–1 data):

RMSE — Root Mean Square Error comparing the original and reconstructed time series point-by-point over time. Measures how well the aggregation reproduces the temporal pattern.
MAE — Mean Absolute Error, same point-by-point comparison but using absolute differences instead of squared.
RMSE (Duration) — RMSE computed on the duration curves (values sorted in descending order). This ignores temporal ordering and measures how well the aggregation preserves the overall value distribution. Typically lower than RMSE because clustering preserves distributions better than exact temporal sequences.

In [18]:

Copied!





# Overall accuracy metrics
print("Accuracy Summary:")
print(result.accuracy)
print("\nRMSE per column:")
print(result.accuracy.rmse)
print("\nMAE per column:")
print(result.accuracy.mae)
print("\nRMSE (Duration) per column:")
print(result.accuracy.rmse_duration)
# Overall accuracy metrics
print("Accuracy Summary:")
print(result.accuracy)
print("\nRMSE per column:")
print(result.accuracy.rmse)
print("\nMAE per column:")
print(result.accuracy.mae)
print("\nRMSE (Duration) per column:")
print(result.accuracy.rmse_duration)

Accuracy Summary:
AccuracyMetrics(
  rmse=0.1013 (weighted),
  mae=0.0692 (weighted),
  rmse_duration=0.0304 (weighted)
)

RMSE per column:
GHI     0.086993
Load    0.085108
T       0.085319
Wind    0.137713
Name: RMSE, dtype: float64

MAE per column:
GHI     0.045101
Load    0.059208
T       0.066456
Wind    0.106038
Name: MAE, dtype: float64

RMSE (Duration) per column:
GHI     0.017392
Load    0.017183
T       0.028772
Wind    0.047606
Name: RMSE_duration, dtype: float64

In [19]:

Copied!

# Visual comparison of accuracy metrics
result.plot.accuracy()
# Visual comparison of accuracy metrics
result.plot.accuracy()

4.2 Residual Analysis¶

Residuals (the difference between the original time series and the reconstructed one) reveal where the aggregation performs well or poorly.

4.2.1 Residual Line plot¶

This plot illustrates the residual time series in the form of a line graph.

In [20]:

Copied!

# Residuals over time (mode="time_series")
result.plot.residuals(columns=["Load"], mode="time_series")
# Residuals over time (mode="time_series")
result.plot.residuals(columns=["Load"], mode="time_series")

4.2.2 Residual Histogram¶

The histogram shows how oftn a specific residual occurs.

In [21]:

Copied!

# Residual distribution (mode="histogram")
result.plot.residuals(columns=["T", "Load"], mode="histogram")
# Residual distribution (mode="histogram")
result.plot.residuals(columns=["T", "Load"], mode="histogram")

4.2.3 Residual Bar Plot by Period¶

This plot shows the mean average error (MAE) between the original and reconstructed data for each period.

In [22]:

Copied!

# Error by period (mode="by_period")
result.plot.residuals(columns=["Load"], mode="by_period")
# Error by period (mode="by_period")
result.plot.residuals(columns=["Load"], mode="by_period")

4.3 Mean Absolute Error By Time Steps¶

This plot illustrates the mean error, averaged over all periods, between the original and reconstructed time series.

In [23]:

Copied!

# Error by timestep within period (mode="by_timestep")
result.plot.residuals(columns=["Load", "GHI"], mode="by_timestep")
# Error by timestep within period (mode="by_timestep")
result.plot.residuals(columns=["Load", "GHI"], mode="by_timestep")

5. Comparing Aggregation Configurations¶

This shows how to visualise the differences between multiple aggregations of the same time series with different numbers of clusters.

5.1 Aggergate Time Series With Multiple Cluster Configuration¶

Compare different numbers of clusters to see the accuracy-complexity trade-off.

In [24]:

Copied!





# Run aggregations with different cluster counts
results = {}
for n in [4, 8, 12, 24]:
    results[f"{n} clusters"] = tsam.aggregate(
        raw,
        n_clusters=n,
        period_duration=24,
        cluster=ClusterConfig(method="hierarchical"),
    )

# Print accuracy comparison
print("RMSE comparison (Load):")
for name, res in results.items():
    print(f"  {name}: {res.accuracy.rmse['Load']:.2f}")

# Build comparison data for plotting
comparison_data = {"Original": raw}
for name, res in results.items():
    comparison_data[name] = res.reconstructed
# Run aggregations with different cluster counts
results = {}
for n in [4, 8, 12, 24]:
    results[f"{n} clusters"] = tsam.aggregate(
        raw,
        n_clusters=n,
        period_duration=24,
        cluster=ClusterConfig(method="hierarchical"),
    )

# Print accuracy comparison
print("RMSE comparison (Load):")
for name, res in results.items():
    print(f"  {name}: {res.accuracy.rmse['Load']:.2f}")

# Build comparison data for plotting
comparison_data = {"Original": raw}
for name, res in results.items():
    comparison_data[name] = res.reconstructed

RMSE comparison (Load):
  4 clusters: 0.14
  8 clusters: 0.10
  12 clusters: 0.09
  24 clusters: 0.07

5.2 Plot Duration Curves for Comparison¶

In [25]:

Copied!





# Compare duration curves across configurations with plotly express
frames = []
for name, df in comparison_data.items():
    sorted_vals = df["Load"].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Load": sorted_vals, "Method": name}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df,
    x="Hour",
    y="Load",
    color="Method",
    title="Duration Curve: Cluster Count Comparison",
)
# Compare duration curves across configurations with plotly express
frames = []
for name, df in comparison_data.items():
    sorted_vals = df["Load"].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Load": sorted_vals, "Method": name}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df,
    x="Hour",
    y="Load",
    color="Method",
    title="Duration Curve: Cluster Count Comparison",
)

5.2 Compare Slices of the Orignal and Reconstructed Time Series.¶

The original time series and its aggregations are compared here for a specific time slice.

In [26]:

Copied!





# Time slice comparison with plotly express
frames = []
for name, df in comparison_data.items():
    sliced = df.loc["20100601":"20100608", ["Load"]].copy()
    sliced["Method"] = name
    frames.append(sliced)
long_df = pd.concat(frames).reset_index(names="Time")

px.line(
    long_df,
    x="Time",
    y="Load",
    color="Method",
    title="June Week: Cluster Count Comparison",
)
# Time slice comparison with plotly express
frames = []
for name, df in comparison_data.items():
    sliced = df.loc["20100601":"20100608", ["Load"]].copy()
    sliced["Method"] = name
    frames.append(sliced)
long_df = pd.concat(frames).reset_index(names="Time")

px.line(
    long_df,
    x="Time",
    y="Load",
    color="Method",
    title="June Week: Cluster Count Comparison",
)

6. Effect of Extreme Period Preservation¶

Compare aggregation with and without preserving extreme values.

6.1 Aggregate Time Series with and Without Extreme Value Preservation¶

In [27]:

Copied!





# Without extreme preservation
result_no_extremes = tsam.aggregate(
    raw,
    n_clusters=8,
    period_duration=24,
    cluster=ClusterConfig(method="hierarchical"),
)

# With extreme preservation
result_with_extremes = tsam.aggregate(
    raw,
    n_clusters=8,
    period_duration=24,
    cluster=ClusterConfig(method="hierarchical"),
    extremes=ExtremeConfig(
        method="new_cluster",
        min_value=["T"],
        max_value=["Load", "GHI"],
    ),
)

print("Without extremes - Load RMSE:", result_no_extremes.accuracy.rmse["Load"])
print("With extremes - Load RMSE:", result_with_extremes.accuracy.rmse["Load"])
# Without extreme preservation
result_no_extremes = tsam.aggregate(
    raw,
    n_clusters=8,
    period_duration=24,
    cluster=ClusterConfig(method="hierarchical"),
)

# With extreme preservation
result_with_extremes = tsam.aggregate(
    raw,
    n_clusters=8,
    period_duration=24,
    cluster=ClusterConfig(method="hierarchical"),
    extremes=ExtremeConfig(
        method="new_cluster",
        min_value=["T"],
        max_value=["Load", "GHI"],
    ),
)

print("Without extremes - Load RMSE:", result_no_extremes.accuracy.rmse["Load"])
print("With extremes - Load RMSE:", result_with_extremes.accuracy.rmse["Load"])

Without extremes - Load RMSE: 0.10117201266675802
With extremes - Load RMSE: 0.09728116855411595

6.2 Plot Duration Curve for Load¶

In [28]:

Copied!





# Compare peak preservation in duration curves with plotly express
comparison_extremes = {
    "Original": raw,
    "No extremes": result_no_extremes.reconstructed,
    "With extremes": result_with_extremes.reconstructed,
}

frames = []
for name, df in comparison_extremes.items():
    sorted_vals = df["Load"].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Load": sorted_vals, "Method": name}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df,
    x="Hour",
    y="Load",
    color="Method",
    title="Effect of Extreme Period Preservation on Load",
)
# Compare peak preservation in duration curves with plotly express
comparison_extremes = {
    "Original": raw,
    "No extremes": result_no_extremes.reconstructed,
    "With extremes": result_with_extremes.reconstructed,
}

frames = []
for name, df in comparison_extremes.items():
    sorted_vals = df["Load"].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Load": sorted_vals, "Method": name}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df,
    x="Hour",
    y="Load",
    color="Method",
    title="Effect of Extreme Period Preservation on Load",
)

6.3 Plot Duration Curve for Temperature¶

In [29]:

Copied!





# Compare temperature extremes with plotly express
frames = []
for name, df in comparison_extremes.items():
    sorted_vals = df["T"].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {
                "Hour": range(len(sorted_vals)),
                "Temperature": sorted_vals,
                "Method": name,
            }
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df,
    x="Hour",
    y="Temperature",
    color="Method",
    title="Effect of Extreme Period Preservation on Temperature",
)
# Compare temperature extremes with plotly express
frames = []
for name, df in comparison_extremes.items():
    sorted_vals = df["T"].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {
                "Hour": range(len(sorted_vals)),
                "Temperature": sorted_vals,
                "Method": name,
            }
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df,
    x="Hour",
    y="Temperature",
    color="Method",
    title="Effect of Extreme Period Preservation on Temperature",
)

7. Segmentation Analysis¶

When using segmentation, you can visualize the segment durations.

7.1 Run Aggregation with Segementation¶

In [30]:

Copied!





# Run aggregation with segmentation
result_segmented = tsam.aggregate(
    raw,
    n_clusters=12,
    period_duration=24,
    cluster=ClusterConfig(method="hierarchical"),
    segments=SegmentConfig(n_segments=6),
)

print(f"Segments per period: {len(result_segmented.segment_durations[0])}")
print(f"Segment durations (first cluster): {result_segmented.segment_durations[0]}")
# Run aggregation with segmentation
result_segmented = tsam.aggregate(
    raw,
    n_clusters=12,
    period_duration=24,
    cluster=ClusterConfig(method="hierarchical"),
    segments=SegmentConfig(n_segments=6),
)

print(f"Segments per period: {len(result_segmented.segment_durations[0])}")
print(f"Segment durations (first cluster): {result_segmented.segment_durations[0]}")

Segments per period: 6
Segment durations (first cluster): (7, 2, 5, 3, 5, 2)

7.2. Show the Duration of Each Segment¶

This plot illustrates the proportion of the original time series represented by each segment.

In [31]:

Copied!

# Plot segment durations
result_segmented.plot.segment_durations()
# Plot segment durations
result_segmented.plot.segment_durations()

7.3 Show Duration Curve of Original and Aggregated Data¶

In [32]:

Copied!





# Compare segmented vs non-segmented with plotly express
comparison_seg = {
    "Original": raw,
    "No segmentation": result.reconstructed,
    "With segmentation": result_segmented.reconstructed,
}

frames = []
for name, df in comparison_seg.items():
    sorted_vals = df["Load"].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Load": sorted_vals, "Method": name}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df,
    x="Hour",
    y="Load",
    color="Method",
    title="Effect of Segmentation on Load Duration Curve",
)
# Compare segmented vs non-segmented with plotly express
comparison_seg = {
    "Original": raw,
    "No segmentation": result.reconstructed,
    "With segmentation": result_segmented.reconstructed,
}

frames = []
for name, df in comparison_seg.items():
    sorted_vals = df["Load"].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Load": sorted_vals, "Method": name}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df,
    x="Hour",
    y="Load",
    color="Method",
    title="Effect of Segmentation on Load Duration Curve",
)