Migrating from ETHOS.TSAM v2 to v3¶
ETHOS.TSAM v3 replaces the class-based API with a functional API.
The old TimeSeriesAggregation class still works but is deprecated
and will be removed in a future release.
This guide covers every change you need to make.
Heads-up: column order changes in v4¶
aggregate() result column order will change in v4
In v3, aggregate() returns cluster_representatives, reconstructed,
and original with columns sorted alphabetically; in v4 they follow
the input DataFrame's order. This can break code that reads results by
position (.values, .iloc[:, 0]) silently — names and shape are
unchanged, but each column's data lands in a different slot. Indexing by
name is unaffected. aggregate() emits a FutureWarning when input columns
are not already alphabetical (the only case that changes).
To keep the v3 order, sort before — aggregate(data.sort_index(axis=1), ...)
— or after — result.cluster_representatives.sort_index(axis=1). To adopt
v4 now, index results by column name. The legacy TimeSeriesAggregation
class is unaffected (it sorts alphabetically in both v3 and v4).
To silence the warning (e.g. you have already migrated):
Quick before-and-after¶
import tsam
from tsam import ClusterConfig, SegmentConfig, ExtremeConfig
result = tsam.aggregate(
df,
n_clusters=8,
period_duration=24,
cluster=ClusterConfig(
method='hierarchical',
representation='distribution_minmax',
),
segments=SegmentConfig(n_segments=12),
preserve_column_means=True,
extremes=ExtremeConfig(max_value=['demand']),
)
representatives = result.cluster_representatives
reconstructed = result.reconstructed
accuracy = result.accuracy.summary
import tsam.timeseriesaggregation as tsam
agg = tsam.TimeSeriesAggregation(
df,
noTypicalPeriods=8,
hoursPerPeriod=24,
clusterMethod='hierarchical',
representationMethod='distributionAndMinMaxRepresentation',
segmentation=True,
noSegments=12,
rescaleClusterPeriods=True,
addPeakMax=['demand'],
)
representatives = agg.createTypicalPeriods()
reconstructed = agg.predictOriginalData()
accuracy = agg.accuracyIndicators()
Parameter mapping¶
The table below maps every old parameter to its v3 equivalent.
| Old (v2) | New (v3) | Notes |
|---|---|---|
timeSeries |
data |
Renamed. |
noTypicalPeriods |
n_clusters |
|
hoursPerPeriod |
period_duration |
Also accepts strings ('24h', '1d'). |
resolution |
temporal_resolution |
Also accepts strings ('1h', '15min'). |
clusterMethod |
ClusterConfig(method=...) |
See cluster method values. |
representationMethod |
ClusterConfig(representation=...) |
See representation values. |
weightDict |
weights |
Top-level kwarg of aggregate(). |
sameMean |
ClusterConfig(normalize_column_means=...) |
|
sortValues |
ClusterConfig(use_duration_curves=...) |
|
evalSumPeriods |
ClusterConfig(include_period_sums=...) |
|
solver |
ClusterConfig(solver=...) |
|
segmentation |
Pass a SegmentConfig or omit it. |
No boolean flag needed. |
noSegments |
SegmentConfig(n_segments=...) |
|
segmentRepresentationMethod |
SegmentConfig(representation=...) |
Uses short names (see below). |
rescaleClusterPeriods |
preserve_column_means |
Top-level kwarg of aggregate(). |
rescaleExcludeColumns |
rescale_exclude_columns |
|
roundOutput |
round_decimals |
|
numericalTolerance |
numerical_tolerance |
|
extremePeriodMethod |
ExtremeConfig(method=...) |
See extreme method values. |
addPeakMax |
ExtremeConfig(max_value=...) |
|
addPeakMin |
ExtremeConfig(min_value=...) |
|
addMeanMax |
ExtremeConfig(max_period=...) |
|
addMeanMin |
ExtremeConfig(min_period=...) |
|
distributionPeriodWise |
Distribution(scope="cluster"\|"global") |
See representation objects. |
representationDict |
MinMaxMean(max_columns=[...], min_columns=[...]) |
See representation objects. |
Cluster method values¶
| Old (v2) | New (v3) |
|---|---|
'averaging' |
'averaging' |
'k_means' |
'kmeans' |
'k_medoids' |
'kmedoids' |
'k_maxoids' |
'kmaxoids' |
'hierarchical' |
'hierarchical' |
'adjacent_periods' |
'contiguous' |
Representation method values¶
| Old (v2) | New (v3) |
|---|---|
'meanRepresentation' |
'mean' |
'medoidRepresentation' |
'medoid' |
'maxoidRepresentation' |
'maxoid' |
'distributionRepresentation' |
'distribution' |
'durationRepresentation' |
'distribution' (both old parameters meant the same) |
'distributionAndMinMaxRepresentation' |
'distribution_minmax' |
'minmaxmeanRepresentation' |
'minmax_mean' |
Typed representation objects¶
For distribution, distribution_minmax, and minmax_mean
representations, v3 offers typed objects that expose options previously
controlled by separate parameters (distributionPeriodWise,
representationDict). Plain string shortcuts still work for the
common cases.
Distribution with global scope (distributionPeriodWise=False):
Distribution with min/max preservation and global scope:
Per-column min/max/mean (representationDict):
Columns not listed in max_columns or min_columns default to mean.
Note
The string shortcuts "distribution", "distribution_minmax", and
"minmax_mean" remain valid and are equivalent to:
"distribution"->Distribution()"distribution_minmax"->Distribution(preserve_minmax=True)"minmax_mean"->MinMaxMean()(all columns default to mean)
Extreme method values¶
| Old (v2) | New (v3) |
|---|---|
'None' |
Omit the extremes parameter entirely. |
'append' |
'append' |
'replace_cluster_center' |
'replace' |
'new_cluster_center' |
'new_cluster' |
Default changes¶
| Parameter | Old default | New default | Impact |
|---|---|---|---|
n_clusters |
10 | required | Code that relied on the default must now pass a value explicitly. |
SegmentConfig(representation=...) |
Inherited from representationMethod |
"mean" |
In v2, omitting segmentRepresentationMethod caused segments to inherit the cluster representation (e.g. distribution). In v3, SegmentConfig always defaults to "mean". If you relied on the implicit inheritance, pass the representation explicitly: SegmentConfig(n_segments=12, representation=Distribution(scope="global")) |
Accessing results¶
The old API returned raw DataFrames and arrays from methods you had to
call in sequence. The new API returns a single AggregationResult
object with everything attached.
| Old (v2) | New (v3) |
|---|---|
agg.createTypicalPeriods() |
result.cluster_representatives |
agg.predictOriginalData() |
result.reconstructed |
agg.accuracyIndicators() |
result.accuracy.summary |
agg.totalAccuracyIndicators() |
result.accuracy.weighted_rmse / result.accuracy.weighted_mae |
agg.clusterOrder |
result.cluster_assignments |
agg.clusterPeriodNoOccur |
result.cluster_weights |
agg.clusterCenterIndices |
result.clustering.cluster_centers |
agg.indexMatching() |
result.assignments |
agg.timeSeries |
result.original |
| (no equivalent) | result.residuals |
| (no equivalent) | result.plot.compare() |
The cluster_representatives DataFrame now uses a
MultiIndex(cluster, timestep) instead of
MultiIndex(PeriodNum, TimeStep).
Accuracy metrics¶
agg.accuracyIndicators() returned a DataFrame indexed by column with
RMSE/MAE/RMSE_duration columns. result.accuracy is an
AccuracyMetrics object: .summary is the equivalent DataFrame, while the
individual metrics are per-column pd.Series.
| Old (v2) | New (v3) |
|---|---|
agg.accuracyIndicators().loc[col, "RMSE"] |
result.accuracy.rmse[col] |
agg.accuracyIndicators().loc[col, "MAE"] |
result.accuracy.mae[col] |
agg.accuracyIndicators().loc[col, "RMSE_duration"] |
result.accuracy.rmse_duration[col] |
agg.totalAccuracyIndicators()["RMSE"] |
result.accuracy.weighted_rmse |
agg.totalAccuracyIndicators()["MAE"] |
result.accuracy.weighted_mae |
With uniform weights the weighted_* totals match the old
totalAccuracyIndicators() values.
Index/assignment metadata¶
agg.indexMatching() returned a DataFrame with PeriodNum/TimeStep/
SegmentIndex columns. result.assignments is the equivalent, indexed by the
original datetime index, with columns period_idx, timestep_idx,
cluster_idx (= old PeriodNum) and segment_idx (only when segmented).
Clustering transfer¶
Reusing a clustering on new data used to require manually passing
predefClusterOrder, predefClusterCenterIndices, etc.
In v3 this is a single method call:
# Cluster on one dataset
result = tsam.aggregate(df_wind, n_clusters=8)
# Apply same clustering to another dataset
result_all = result.clustering.apply(df_all)
# Save and load clusterings
result.clustering.to_json("clustering.json")
from tsam import ClusteringResult
clustering = ClusteringResult.from_json("clustering.json")
result = clustering.apply(df)
Plotting¶
Plotting has moved from matplotlib to plotly.
Instead of calling separate functions, use the result.plot accessor:
result.plot.compare() # Duration curves: original vs reconstructed
result.plot.residuals() # Reconstruction errors
result.plot.heatmap() # Heatmap of cluster representatives
result.plot.cluster_assignments() # Period-to-cluster mapping
result.plot.cluster_weights() # Cluster occurrence counts
result.plot.accuracy() # Accuracy metrics bar chart
Hyperparameter tuning¶
The HyperTunedAggregations class is replaced by two functions in
tsam.tuning.
identifyOptimalSegmentPeriodCombination -> find_optimal_combination¶
import tsam
from tsam import ClusterConfig
result = tsam.tuning.find_optimal_combination(
df,
data_reduction=0.01,
period_duration=24,
cluster=ClusterConfig(method="hierarchical"),
segment_representation="mean",
)
segments = result.n_segments
periods = result.n_clusters
rmse = result.rmse
best = result.best_result # AggregationResult
from tsam.hyperparametertuning import HyperTunedAggregations
import tsam.timeseriesaggregation as tsam_legacy
agg = HyperTunedAggregations(
tsam_legacy.TimeSeriesAggregation(
df,
hoursPerPeriod=24,
clusterMethod="hierarchical",
representationMethod="meanRepresentation",
segmentation=True,
)
)
segments, periods, rmse = agg.identifyOptimalSegmentPeriodCombination(
dataReduction=0.01,
)
identifyParetoOptimalAggregation -> find_pareto_front¶
The TuningResult returned by both functions also supports
find_by_timesteps(target) and find_by_rmse(threshold) for
querying specific configurations, and iteration via for r in result.
Helper functions¶
| Old (v2) | New (v3) |
|---|---|
getNoPeriodsForDataReduction(n, segs, red) |
tsam.tuning.find_clusters_for_reduction(n, segs, red) |
getNoSegmentsForDataReduction(n, periods, red) |
tsam.tuning.find_segments_for_reduction(n, periods, red) |
New capabilities¶
- Parallel execution: Pass
n_jobs=-1to use all CPU cores. - Targeted exploration:
find_pareto_frontaccepts atimestepssequence (e.g.,range(10, 500, 10)) for faster targeted search instead of full steepest descent. - Built-in visualization:
result.plot()shows an interactive RMSE-vs-timesteps chart.
Performance¶
tsam v3 is significantly faster than v2.3.9, primarily due to replacing pandas loops with vectorized numpy operations.
| Configuration | constant | testdata | wide | with_zero_col |
|---|---|---|---|---|
| hierarchical (default) | 2x | 44x | 25x | 42x |
| hierarchical (distribution) | 5x | 55x | 35x | 51x |
| averaging | 5x | 77x | 66x | 74x |
| contiguous | 5x | 54x | 50x | 53x |
| distribution (global) | 2x | 16x | 7x | 13x |
| kmeans | 1.4x | 4x | 6x | 6x |
| kmaxoids | 1.3x | 1.4x | 1.4x | 1.4x |
Key optimizations
predictOriginalData(): Vectorized indexing replaces per-period.unstack()loop (~290x function speedup).durationRepresentation(): numpy 3D operations replace nested pandas loops (~8x function speedup, contributing to the distribution config gains above)._rescaleClusterPeriods(): numpy 3D arrays replace pandas MultiIndex operations (~11x function speedup).
Iterative methods (kmeans, kmedoids, kmaxoids) show modest gains because the solver itself dominates runtime.
Use benchmarks/bench.py to run your own comparisons:
Result consistency and reproducibility¶
Cross-platform reproducibility
v2.3.9 used numpy's default unstable sort (introsort) in
durationRepresentation(), which does not guarantee a specific order
for tied values. In practice, this caused different results on different
platforms (macOS vs Linux vs Windows) for distribution representations.
v3 fixes this by using kind="stable" (mergesort) for all sorting
operations and rounding floating-point means to 10 decimal places before
tie-breaking. This guarantees identical results across macOS, Linux,
and Windows for all configurations.
Consistency with v2.3.9
As a consequence of the stable sort fix, 4 distribution-related configurations produce slightly different results compared to v2.3.9:
hierarchical_distributionhierarchical_distribution_minmaxdistribution_globaldistribution_minmax_global
The stable sort breaks ties by position rather than arbitrarily, and rounding absorbs ~1e-16 floating-point noise that previously created artificial ordering among effectively-equal means. This changes the assignment of representative values to time steps, but preserves all statistical properties (same distribution, same min/max, same weighted mean).
All other 23 configurations (hierarchical with medoid/mean/maxoid, averaging, contiguous, kmeans, kmedoids, kmaxoids, minmaxmean, segmentation, extremes) are bit-for-bit identical to v2.3.9.
Result stability is enforced by golden regression tests
(test/test_golden_regression.py): 111 tests compare tsam.aggregate()
against stored CSV baselines (originally produced by the pre-v3.0.0 API).
Any code change that alters output values will fail these tests.
If a future release intentionally changes results (e.g., improved algorithm), the golden files will be regenerated and the change documented in the changelog.
Suppressing warnings¶
During migration you can silence the deprecation warnings:
import warnings
from tsam import LegacyAPIWarning
warnings.filterwarnings("ignore", category=LegacyAPIWarning)
Removed parameters¶
prepareEnersysInput()- Removed. Access result properties directly instead.