Basic Example¶

Example usage of the time series aggregation module (tsam).

This notebook demonstrates:

Basic k-means aggregation
Hierarchical aggregation with extreme periods
Advanced aggregation with segmentation

Import pandas and the relevant time series aggregation class

In [1]:

Copied!





%load_ext autoreload
%autoreload 2

from pathlib import Path

import pandas as pd
import plotly.express as px
import plotly.io as pio

import tsam
from tsam import ClusterConfig, ExtremeConfig, SegmentConfig

pio.renderers.default = "notebook_connected"

# Ensure results directory exists
RESULTS_DIR = Path("results")
RESULTS_DIR.mkdir(exist_ok=True)
import warnings

# Added to every example notebook: silence the v3 column-order
# FutureWarning in the rendered docs (tsam v4 returns result columns in
# input order; see migration guide).
warnings.filterwarnings(
    "ignore", category=FutureWarning, message=".*sorted alphabetically.*"
)
%load_ext autoreload
%autoreload 2

from pathlib import Path

import pandas as pd
import plotly.express as px
import plotly.io as pio

import tsam
from tsam import ClusterConfig, ExtremeConfig, SegmentConfig

pio.renderers.default = "notebook_connected"

# Ensure results directory exists
RESULTS_DIR = Path("results")
RESULTS_DIR.mkdir(exist_ok=True)
import warnings

# Added to every example notebook: silence the v3 column-order
# FutureWarning in the rendered docs (tsam v4 returns result columns in
# input order; see migration guide).
warnings.filterwarnings(
    "ignore", category=FutureWarning, message=".*sorted alphabetically.*"
)

Input data¶

Read in time series from testdata.csv with pandas

In [2]:

Copied!

raw = pd.read_csv("testdata.csv", index_col=0, parse_dates=True)
raw = pd.read_csv("testdata.csv", index_col=0, parse_dates=True)

Show a slice of the dataset

In [3]:

Copied!

raw.head()
raw.head()

Out[3]:

	T	Wind	Load
2009-12-31 23:30:00	-2.1	7.1	375.478394
2010-01-01 00:30:00	-2.8	8.6	364.541326
2010-01-01 01:30:00	-3.3	9.7	357.416844
2010-01-01 02:30:00	-3.2	9.8	350.191306
2010-01-01 03:30:00	-3.2	9.4	345.161449

Show the shape of the raw input data: 4 types of timeseries (GHI, Temperature, Wind and Load) for every hour in a year

In [4]:

Copied!

raw.shape
raw.shape

Out[4]:

(8760, 4)

Plot the original temperature data as a heatmap

In [5]:

Copied!





# Use tsam.unstack_to_periods() with plotly for heatmaps
unstacked = tsam.unstack_to_periods(raw, period_duration=24)
px.imshow(
    unstacked["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Original Temperature",
    aspect="auto",
)
# Use tsam.unstack_to_periods() with plotly for heatmaps
unstacked = tsam.unstack_to_periods(raw, period_duration=24)
px.imshow(
    unstacked["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Original Temperature",
    aspect="auto",
)

Use the aggregate() function with k-means clustering for eight typical days.

In [6]:

Copied!





result_kmeans = tsam.aggregate(
    raw,
    n_clusters=8,
    period_duration=24,
    cluster=ClusterConfig(method="kmeans"),
)
result_kmeans = tsam.aggregate(
    raw,
    n_clusters=8,
    period_duration=24,
    cluster=ClusterConfig(method="kmeans"),
)

Access the typical periods from the result object

In [7]:

Copied!

cluster_representatives = result_kmeans.cluster_representatives
cluster_representatives.head()
cluster_representatives = result_kmeans.cluster_representatives
cluster_representatives.head()

Out[7]:

		GHI	Load	T	Wind
	timestep
0	0	0.0	442.303633	-2.285714	3.453571
	1	0.0	428.829819	-2.442857	3.453571
	2	0.0	424.248154	-2.625000	3.307143
	3	0.0	426.376082	-2.700000	3.260714
	4	0.0	432.959846	-2.885714	3.364286

Show shape of typical periods: 4 types of timeseries for 8*24 hours

In [8]:

Copied!





print(f"Shape: {cluster_representatives.shape}")
print(
    f"Periods: {result_kmeans.n_clusters}, Timesteps per period: {result_kmeans.n_timesteps_per_period}"
)
print(f"Shape: {cluster_representatives.shape}")
print(
    f"Periods: {result_kmeans.n_clusters}, Timesteps per period: {result_kmeans.n_timesteps_per_period}"
)

Shape: (192, 4)
Periods: 8, Timesteps per period: 24

Save typical periods to .csv file

In [9]:

Copied!

cluster_representatives.to_csv(RESULTS_DIR / "testperiods_kmeans.csv")
cluster_representatives.to_csv(RESULTS_DIR / "testperiods_kmeans.csv")

Reconstruct the original time series based on the typical periods

In [10]:

Copied!

reconstructed = result_kmeans.reconstructed
reconstructed = result_kmeans.reconstructed

Plot the repredicted data

In [11]:

Copied!





# K-means reconstructed temperature heatmap
unstacked_kmeans = tsam.unstack_to_periods(reconstructed, period_duration=24)
px.imshow(
    unstacked_kmeans["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="K-means Reconstructed Temperature",
    aspect="auto",
)
# K-means reconstructed temperature heatmap
unstacked_kmeans = tsam.unstack_to_periods(reconstructed, period_duration=24)
px.imshow(
    unstacked_kmeans["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="K-means Reconstructed Temperature",
    aspect="auto",
)

As seen, they days with the minimal temperature are excluded. In case that they are required they can be added to the aggregation as follow.

Hierarchical aggregation including extreme periods¶

Use hierarchical clustering with extreme period preservation. This ensures the day with the minimum temperature and maximum load are included.

In [12]:

Copied!





result_hier = tsam.aggregate(
    raw,
    n_clusters=8,
    period_duration=24,
    cluster=ClusterConfig(method="hierarchical"),
    extremes=ExtremeConfig(
        method="new_cluster",
        min_value=["T"],  # Preserve day with minimum temperature
        max_value=["Load"],  # Preserve day with maximum load
    ),
)
result_hier = tsam.aggregate(
    raw,
    n_clusters=8,
    period_duration=24,
    cluster=ClusterConfig(method="hierarchical"),
    extremes=ExtremeConfig(
        method="new_cluster",
        min_value=["T"],  # Preserve day with minimum temperature
        max_value=["Load"],  # Preserve day with maximum load
    ),
)

Create the typical periods

In [13]:

Copied!

cluster_representatives = result_hier.cluster_representatives
cluster_representatives.head()
cluster_representatives = result_hier.cluster_representatives
cluster_representatives.head()

Out[13]:

		GHI	Load	T	Wind
	timestep
0	0	0.0	403.253822	-0.654502	3.541068
	1	0.0	394.008077	-0.949049	4.485353
	2	0.0	389.631672	-1.047231	3.068926
	3	0.0	391.161914	-1.243596	2.832854
	4	0.0	396.952828	-1.439960	2.596783

The aggregation can also be evaluated by indicators

In [14]:

Copied!





# View accuracy metrics
print(result_hier.accuracy)
print("\nRMSE per column:")
print(result_hier.accuracy.rmse)
# View accuracy metrics
print(result_hier.accuracy)
print("\nRMSE per column:")
print(result_hier.accuracy.rmse)

AccuracyMetrics(
  rmse=0.1083 (weighted),
  mae=0.0751 (weighted),
  rmse_duration=0.0362 (weighted)
)

RMSE per column:
GHI     0.093828
Load    0.099253
T       0.086290
Wind    0.144376
Name: RMSE, dtype: float64

Save typical periods to .csv file

In [15]:

Copied!

cluster_representatives.to_csv(RESULTS_DIR / "testperiods_hierarchical.csv")
cluster_representatives.to_csv(RESULTS_DIR / "testperiods_hierarchical.csv")

Repredict the original time series based on the typical periods

In [16]:

Copied!

reconstructed_extremes = result_hier.reconstructed
reconstructed_extremes = result_hier.reconstructed

Plot repredicted data

In [17]:

Copied!





# Hierarchical with extremes reconstructed temperature heatmap
unstacked_hier = tsam.unstack_to_periods(reconstructed_extremes, period_duration=24)
px.imshow(
    unstacked_hier["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Hierarchical + Extremes Reconstructed Temperature",
    aspect="auto",
)
# Hierarchical with extremes reconstructed temperature heatmap
unstacked_hier = tsam.unstack_to_periods(reconstructed_extremes, period_duration=24)
px.imshow(
    unstacked_hier["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Hierarchical + Extremes Reconstructed Temperature",
    aspect="auto",
)

Now also the days with the minimal temperature are integrated into the typical periods.

Advanced aggregation method¶

Combining hierarchical clustering with segmentation (reduced temporal resolution) and distribution-preserving representation.

In [18]:

Copied!





result_advanced = tsam.aggregate(
    raw,
    n_clusters=24,
    period_duration=24,
    cluster=ClusterConfig(
        method="hierarchical",
        representation="distribution_minmax",
    ),
    segments=SegmentConfig(n_segments=8),
)
result_advanced = tsam.aggregate(
    raw,
    n_clusters=24,
    period_duration=24,
    cluster=ClusterConfig(
        method="hierarchical",
        representation="distribution_minmax",
    ),
    segments=SegmentConfig(n_segments=8),
)

In [19]:

Copied!

reconstructed_advanced = result_advanced.reconstructed
reconstructed_advanced = result_advanced.reconstructed

In [20]:

Copied!





# Advanced method reconstructed temperature heatmap
unstacked_adv = tsam.unstack_to_periods(reconstructed_advanced, period_duration=24)
px.imshow(
    unstacked_adv["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Advanced Method Reconstructed Temperature",
    aspect="auto",
)
# Advanced method reconstructed temperature heatmap
unstacked_adv = tsam.unstack_to_periods(reconstructed_advanced, period_duration=24)
px.imshow(
    unstacked_adv["T"].values.T,
    labels={"x": "Day", "y": "Hour", "color": "Temperature"},
    title="Advanced Method Reconstructed Temperature",
    aspect="auto",
)

Comparison of the aggregations¶

It was shown for the temperature, but both times all four time series have been aggregated. Therefore, we compare here also the duration curves of the electrical load for the original time series, the aggregation with k-mean, and the hierarchical aggregation including peak periods.

In [21]:

Copied!





# Duration curve comparison using plotly express
comparison_data = {
    "Original": raw,
    "8 typ days": reconstructed,
    "8 typ days + peak": reconstructed_extremes,
    "24 typ days + 8 seg": reconstructed_advanced,
}

# Build long-form DataFrame for px.line
frames = []
for name, df in comparison_data.items():
    sorted_vals = df["Load"].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Load": sorted_vals, "Method": name}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df,
    x="Hour",
    y="Load",
    color="Method",
    title="Duration Curve Comparison - Load",
)
# Duration curve comparison using plotly express
comparison_data = {
    "Original": raw,
    "8 typ days": reconstructed,
    "8 typ days + peak": reconstructed_extremes,
    "24 typ days + 8 seg": reconstructed_advanced,
}

# Build long-form DataFrame for px.line
frames = []
for name, df in comparison_data.items():
    sorted_vals = df["Load"].sort_values(ascending=False).reset_index(drop=True)
    frames.append(
        pd.DataFrame(
            {"Hour": range(len(sorted_vals)), "Load": sorted_vals, "Method": name}
        )
    )
long_df = pd.concat(frames, ignore_index=True)

px.line(
    long_df,
    x="Hour",
    y="Load",
    color="Method",
    title="Duration Curve Comparison - Load",
)

Or as unsorted time series for an example week

In [22]:

Copied!





# Time slice comparison - Load
frames = []
for name, df in comparison_data.items():
    sliced = df.loc["20100210":"20100218", ["Load"]].copy()
    sliced["Method"] = name
    frames.append(sliced)
long_df = pd.concat(frames).reset_index(names="Time")

px.line(
    long_df,
    x="Time",
    y="Load",
    color="Method",
    title="Time Slice Comparison - Load (Feb 10-18)",
)
# Time slice comparison - Load
frames = []
for name, df in comparison_data.items():
    sliced = df.loc["20100210":"20100218", ["Load"]].copy()
    sliced["Method"] = name
    frames.append(sliced)
long_df = pd.concat(frames).reset_index(names="Time")

px.line(
    long_df,
    x="Time",
    y="Load",
    color="Method",
    title="Time Slice Comparison - Load (Feb 10-18)",
)

In [23]:

Copied!





# Time slice comparison - GHI
frames = []
for name, df in comparison_data.items():
    sliced = df.loc["20100210":"20100218", ["GHI"]].copy()
    sliced["Method"] = name
    frames.append(sliced)
long_df = pd.concat(frames).reset_index(names="Time")

px.line(
    long_df,
    x="Time",
    y="GHI",
    color="Method",
    title="Time Slice Comparison - Solar Irradiance (Feb 10-18)",
)
# Time slice comparison - GHI
frames = []
for name, df in comparison_data.items():
    sliced = df.loc["20100210":"20100218", ["GHI"]].copy()
    sliced["Method"] = name
    frames.append(sliced)
long_df = pd.concat(frames).reset_index(names="Time")

px.line(
    long_df,
    x="Time",
    y="GHI",
    color="Method",
    title="Time Slice Comparison - Solar Irradiance (Feb 10-18)",
)