Skip to content

Migrating from ETHOS.TSAM v2 to v3

ETHOS.TSAM v3 replaces the class-based API with a functional API. The old TimeSeriesAggregation class still works but is deprecated and will be removed in a future release.

This guide covers every change you need to make.

Heads-up: column order changes in v4

aggregate() result column order will change in v4

In v3, aggregate() returns cluster_representatives, reconstructed, and original with columns sorted alphabetically; in v4 they follow the input DataFrame's order. This can break code that reads results by position (.values, .iloc[:, 0]) silently — names and shape are unchanged, but each column's data lands in a different slot. Indexing by name is unaffected. aggregate() emits a FutureWarning when input columns are not already alphabetical (the only case that changes).

To keep the v3 order, sort before — aggregate(data.sort_index(axis=1), ...) — or after — result.cluster_representatives.sort_index(axis=1). To adopt v4 now, index results by column name. The legacy TimeSeriesAggregation class is unaffected (it sorts alphabetically in both v3 and v4).

To silence the warning (e.g. you have already migrated):

import warnings

warnings.filterwarnings(
    "ignore", category=FutureWarning, message=".*sorted alphabetically.*"
)

Quick before-and-after

import tsam
from tsam import ClusterConfig, SegmentConfig, ExtremeConfig

result = tsam.aggregate(
    df,
    n_clusters=8,
    period_duration=24,
    cluster=ClusterConfig(
        method='hierarchical',
        representation='distribution_minmax',
    ),
    segments=SegmentConfig(n_segments=12),
    preserve_column_means=True,
    extremes=ExtremeConfig(max_value=['demand']),
)
representatives = result.cluster_representatives
reconstructed = result.reconstructed
accuracy = result.accuracy.summary
import tsam.timeseriesaggregation as tsam

agg = tsam.TimeSeriesAggregation(
    df,
    noTypicalPeriods=8,
    hoursPerPeriod=24,
    clusterMethod='hierarchical',
    representationMethod='distributionAndMinMaxRepresentation',
    segmentation=True,
    noSegments=12,
    rescaleClusterPeriods=True,
    addPeakMax=['demand'],
)
representatives = agg.createTypicalPeriods()
reconstructed = agg.predictOriginalData()
accuracy = agg.accuracyIndicators()

Parameter mapping

The table below maps every old parameter to its v3 equivalent.

Old (v2) New (v3) Notes
timeSeries data Renamed.
noTypicalPeriods n_clusters
hoursPerPeriod period_duration Also accepts strings ('24h', '1d').
resolution temporal_resolution Also accepts strings ('1h', '15min').
clusterMethod ClusterConfig(method=...) See cluster method values.
representationMethod ClusterConfig(representation=...) See representation values.
weightDict weights Top-level kwarg of aggregate().
sameMean ClusterConfig(normalize_column_means=...)
sortValues ClusterConfig(use_duration_curves=...)
evalSumPeriods ClusterConfig(include_period_sums=...)
solver ClusterConfig(solver=...)
segmentation Pass a SegmentConfig or omit it. No boolean flag needed.
noSegments SegmentConfig(n_segments=...)
segmentRepresentationMethod SegmentConfig(representation=...) Uses short names (see below).
rescaleClusterPeriods preserve_column_means Top-level kwarg of aggregate().
rescaleExcludeColumns rescale_exclude_columns
roundOutput round_decimals
numericalTolerance numerical_tolerance
extremePeriodMethod ExtremeConfig(method=...) See extreme method values.
addPeakMax ExtremeConfig(max_value=...)
addPeakMin ExtremeConfig(min_value=...)
addMeanMax ExtremeConfig(max_period=...)
addMeanMin ExtremeConfig(min_period=...)
distributionPeriodWise Distribution(scope="cluster"\|"global") See representation objects.
representationDict MinMaxMean(max_columns=[...], min_columns=[...]) See representation objects.

Cluster method values

Old (v2) New (v3)
'averaging' 'averaging'
'k_means' 'kmeans'
'k_medoids' 'kmedoids'
'k_maxoids' 'kmaxoids'
'hierarchical' 'hierarchical'
'adjacent_periods' 'contiguous'

Representation method values

Old (v2) New (v3)
'meanRepresentation' 'mean'
'medoidRepresentation' 'medoid'
'maxoidRepresentation' 'maxoid'
'distributionRepresentation' 'distribution'
'durationRepresentation' 'distribution' (both old parameters meant the same)
'distributionAndMinMaxRepresentation' 'distribution_minmax'
'minmaxmeanRepresentation' 'minmax_mean'

Typed representation objects

For distribution, distribution_minmax, and minmax_mean representations, v3 offers typed objects that expose options previously controlled by separate parameters (distributionPeriodWise, representationDict). Plain string shortcuts still work for the common cases.

Distribution with global scope (distributionPeriodWise=False):

from tsam import Distribution

result = tsam.aggregate(
    df,
    n_clusters=8,
    cluster=ClusterConfig(
        representation=Distribution(scope="global"),
    ),
)
agg = tsam.TimeSeriesAggregation(
    df,
    noTypicalPeriods=8,
    representationMethod='distributionRepresentation',
    distributionPeriodWise=False,
)

Distribution with min/max preservation and global scope:

from tsam import Distribution

result = tsam.aggregate(
    df,
    n_clusters=8,
    cluster=ClusterConfig(
        representation=Distribution(scope="global", preserve_minmax=True),
    ),
)
agg = tsam.TimeSeriesAggregation(
    df,
    noTypicalPeriods=8,
    representationMethod='distributionAndMinMaxRepresentation',
    distributionPeriodWise=False,
)

Per-column min/max/mean (representationDict):

from tsam import MinMaxMean

result = tsam.aggregate(
    df,
    n_clusters=8,
    cluster=ClusterConfig(
        representation=MinMaxMean(
            max_columns=['GHI'],
            min_columns=['T', 'Load'],
        ),
    ),
)
agg = tsam.TimeSeriesAggregation(
    df,
    noTypicalPeriods=8,
    representationMethod='minmaxmeanRepresentation',
    representationDict={'GHI': 'max', 'T': 'min', 'Wind': 'mean', 'Load': 'min'},
)

Columns not listed in max_columns or min_columns default to mean.

Note

The string shortcuts "distribution", "distribution_minmax", and "minmax_mean" remain valid and are equivalent to:

  • "distribution" -> Distribution()
  • "distribution_minmax" -> Distribution(preserve_minmax=True)
  • "minmax_mean" -> MinMaxMean() (all columns default to mean)

Extreme method values

Old (v2) New (v3)
'None' Omit the extremes parameter entirely.
'append' 'append'
'replace_cluster_center' 'replace'
'new_cluster_center' 'new_cluster'

Default changes

Parameter Old default New default Impact
n_clusters 10 required Code that relied on the default must now pass a value explicitly.
SegmentConfig(representation=...) Inherited from representationMethod "mean" In v2, omitting segmentRepresentationMethod caused segments to inherit the cluster representation (e.g. distribution). In v3, SegmentConfig always defaults to "mean". If you relied on the implicit inheritance, pass the representation explicitly: SegmentConfig(n_segments=12, representation=Distribution(scope="global"))

Accessing results

The old API returned raw DataFrames and arrays from methods you had to call in sequence. The new API returns a single AggregationResult object with everything attached.

Old (v2) New (v3)
agg.createTypicalPeriods() result.cluster_representatives
agg.predictOriginalData() result.reconstructed
agg.accuracyIndicators() result.accuracy.summary
agg.totalAccuracyIndicators() result.accuracy.weighted_rmse / result.accuracy.weighted_mae
agg.clusterOrder result.cluster_assignments
agg.clusterPeriodNoOccur result.cluster_weights
agg.clusterCenterIndices result.clustering.cluster_centers
agg.indexMatching() result.assignments
agg.timeSeries result.original
(no equivalent) result.residuals
(no equivalent) result.plot.compare()

The cluster_representatives DataFrame now uses a MultiIndex(cluster, timestep) instead of MultiIndex(PeriodNum, TimeStep).

Accuracy metrics

agg.accuracyIndicators() returned a DataFrame indexed by column with RMSE/MAE/RMSE_duration columns. result.accuracy is an AccuracyMetrics object: .summary is the equivalent DataFrame, while the individual metrics are per-column pd.Series.

Old (v2) New (v3)
agg.accuracyIndicators().loc[col, "RMSE"] result.accuracy.rmse[col]
agg.accuracyIndicators().loc[col, "MAE"] result.accuracy.mae[col]
agg.accuracyIndicators().loc[col, "RMSE_duration"] result.accuracy.rmse_duration[col]
agg.totalAccuracyIndicators()["RMSE"] result.accuracy.weighted_rmse
agg.totalAccuracyIndicators()["MAE"] result.accuracy.weighted_mae

With uniform weights the weighted_* totals match the old totalAccuracyIndicators() values.

Index/assignment metadata

agg.indexMatching() returned a DataFrame with PeriodNum/TimeStep/ SegmentIndex columns. result.assignments is the equivalent, indexed by the original datetime index, with columns period_idx, timestep_idx, cluster_idx (= old PeriodNum) and segment_idx (only when segmented).

Clustering transfer

Reusing a clustering on new data used to require manually passing predefClusterOrder, predefClusterCenterIndices, etc. In v3 this is a single method call:

# Cluster on one dataset
result = tsam.aggregate(df_wind, n_clusters=8)

# Apply same clustering to another dataset
result_all = result.clustering.apply(df_all)

# Save and load clusterings
result.clustering.to_json("clustering.json")

from tsam import ClusteringResult
clustering = ClusteringResult.from_json("clustering.json")
result = clustering.apply(df)
# Required manually passing multiple parameters
agg2 = tsam.TimeSeriesAggregation(
    df_all,
    predefClusterOrder=agg.clusterOrder,
    predefClusterCenterIndices=agg.clusterCenterIndices,
    ...
)

Plotting

Plotting has moved from matplotlib to plotly. Instead of calling separate functions, use the result.plot accessor:

result.plot.compare()               # Duration curves: original vs reconstructed
result.plot.residuals()             # Reconstruction errors
result.plot.heatmap()               # Heatmap of cluster representatives
result.plot.cluster_assignments()   # Period-to-cluster mapping
result.plot.cluster_weights()       # Cluster occurrence counts
result.plot.accuracy()              # Accuracy metrics bar chart

Hyperparameter tuning

The HyperTunedAggregations class is replaced by two functions in tsam.tuning.

identifyOptimalSegmentPeriodCombination -> find_optimal_combination

import tsam
from tsam import ClusterConfig

result = tsam.tuning.find_optimal_combination(
    df,
    data_reduction=0.01,
    period_duration=24,
    cluster=ClusterConfig(method="hierarchical"),
    segment_representation="mean",
)
segments = result.n_segments
periods = result.n_clusters
rmse = result.rmse
best = result.best_result          # AggregationResult
from tsam.hyperparametertuning import HyperTunedAggregations
import tsam.timeseriesaggregation as tsam_legacy

agg = HyperTunedAggregations(
    tsam_legacy.TimeSeriesAggregation(
        df,
        hoursPerPeriod=24,
        clusterMethod="hierarchical",
        representationMethod="meanRepresentation",
        segmentation=True,
    )
)
segments, periods, rmse = agg.identifyOptimalSegmentPeriodCombination(
    dataReduction=0.01,
)

identifyParetoOptimalAggregation -> find_pareto_front

pareto = tsam.tuning.find_pareto_front(
    df,
    period_duration=24,
    max_timesteps=500,
    cluster=ClusterConfig(method="hierarchical"),
    segment_representation="mean",
)
print(pareto.summary)              # DataFrame of all tested configs
pareto.plot()                      # Interactive Plotly visualization
agg.identifyParetoOptimalAggregation(untilTotalTimeSteps=500)
for a in agg.aggregationHistory:
    print(a.totalAccuracyIndicators()["RMSE"])

The TuningResult returned by both functions also supports find_by_timesteps(target) and find_by_rmse(threshold) for querying specific configurations, and iteration via for r in result.

Helper functions

Old (v2) New (v3)
getNoPeriodsForDataReduction(n, segs, red) tsam.tuning.find_clusters_for_reduction(n, segs, red)
getNoSegmentsForDataReduction(n, periods, red) tsam.tuning.find_segments_for_reduction(n, periods, red)

New capabilities

  • Parallel execution: Pass n_jobs=-1 to use all CPU cores.
  • Targeted exploration: find_pareto_front accepts a timesteps sequence (e.g., range(10, 500, 10)) for faster targeted search instead of full steepest descent.
  • Built-in visualization: result.plot() shows an interactive RMSE-vs-timesteps chart.

Performance

tsam v3 is significantly faster than v2.3.9, primarily due to replacing pandas loops with vectorized numpy operations.

Configuration constant testdata wide with_zero_col
hierarchical (default) 2x 44x 25x 42x
hierarchical (distribution) 5x 55x 35x 51x
averaging 5x 77x 66x 74x
contiguous 5x 54x 50x 53x
distribution (global) 2x 16x 7x 13x
kmeans 1.4x 4x 6x 6x
kmaxoids 1.3x 1.4x 1.4x 1.4x
Key optimizations
  • predictOriginalData(): Vectorized indexing replaces per-period .unstack() loop (~290x function speedup).
  • durationRepresentation(): numpy 3D operations replace nested pandas loops (~8x function speedup, contributing to the distribution config gains above).
  • _rescaleClusterPeriods(): numpy 3D arrays replace pandas MultiIndex operations (~11x function speedup).

Iterative methods (kmeans, kmedoids, kmaxoids) show modest gains because the solver itself dominates runtime.

Use benchmarks/bench.py to run your own comparisons:

pytest benchmarks/bench.py --benchmark-save=my_run

Result consistency and reproducibility

Cross-platform reproducibility

v2.3.9 used numpy's default unstable sort (introsort) in durationRepresentation(), which does not guarantee a specific order for tied values. In practice, this caused different results on different platforms (macOS vs Linux vs Windows) for distribution representations.

v3 fixes this by using kind="stable" (mergesort) for all sorting operations and rounding floating-point means to 10 decimal places before tie-breaking. This guarantees identical results across macOS, Linux, and Windows for all configurations.

Consistency with v2.3.9

As a consequence of the stable sort fix, 4 distribution-related configurations produce slightly different results compared to v2.3.9:

  • hierarchical_distribution
  • hierarchical_distribution_minmax
  • distribution_global
  • distribution_minmax_global

The stable sort breaks ties by position rather than arbitrarily, and rounding absorbs ~1e-16 floating-point noise that previously created artificial ordering among effectively-equal means. This changes the assignment of representative values to time steps, but preserves all statistical properties (same distribution, same min/max, same weighted mean).

All other 23 configurations (hierarchical with medoid/mean/maxoid, averaging, contiguous, kmeans, kmedoids, kmaxoids, minmaxmean, segmentation, extremes) are bit-for-bit identical to v2.3.9.

Result stability is enforced by golden regression tests (test/test_golden_regression.py): 111 tests compare tsam.aggregate() against stored CSV baselines (originally produced by the pre-v3.0.0 API). Any code change that alters output values will fail these tests.

If a future release intentionally changes results (e.g., improved algorithm), the golden files will be regenerated and the change documented in the changelog.

Suppressing warnings

During migration you can silence the deprecation warnings:

import warnings
from tsam import LegacyAPIWarning

warnings.filterwarnings("ignore", category=LegacyAPIWarning)

Removed parameters

prepareEnersysInput()
Removed. Access result properties directly instead.