ETHOS.TSAM Change Log¶
All notable changes to this project will be documented in this file.
New entries are automatically added by release-please from conventional commit messages.
3.4.1 (2026-05-31)¶
Bug Fixes¶
3.4.0 (2026-05-15)¶
Features¶
Bug Fixes¶
- ci: gracefully skip GitHub release when already created by release-please (#260) (6fc668d)
- handle missing columns in weightDict in accuracyIndicators (#288) (a475570)
3.3.0 (2026-03-30)¶
Features¶
- AccuracyMetrics now exposes weighted_rmse, weighted_mae, and weighted_rmse_duration as pre-computed scalars (#238) (b70b819)
- add disaggregate() method (#245) (b24e32e)
Bug Fixes¶
3.2.1 (2026-03-25)¶
Bug Fixes¶
3.2.0 (2026-03-24)¶
This release moves the weights argument out of ClusterConfigand into aggregate (and similar methods), while deprecating the old usage inside ClusterConfig. The Parameter affects all aggregation steps and is now placed accordingly. Further, we added a new plotting method that allows you to inspect cluster members and their representation.
Features¶
- Interactive cluster member visualization (#159) (61c6296)
- Move weights to top-level aggregate() parameter (#195) (4f177d0)
Documentation¶
- Add ETHOS.TSAM branding, FZJ theme, and documentation update (#194) (d24a0a3)
- extract glossary into standalone file (d24a0a3)
- improve codeblock in Getting Started: (d24a0a3)
- remove integrated software section and update legal notice (#218) (4c9cc71)
- update images to README_assets v1.0.0 and add missing publication (#215) (e56a686)
3.1.1¶
ETHOS.TSAM v3.1.1 is the first stable v3 release (versions 3.0.0 and 3.1.0 were yanked from PyPI). It introduces a modern functional API alongside significant improvements to performance, plotting, hyperparameter tuning, and overall code quality.
See the migration guide for a complete guide on upgrading from v2.
Breaking Changes¶
- New functional API: The primary interface is now
tsam.aggregate()which returns anAggregationResultobject - Configuration objects: Clustering and segmentation options are now configured via
ClusterConfig,SegmentConfig, andExtremeConfigdataclasses - Segment representation default: In v2, omitting
segmentRepresentationMethodcaused segments to silently inherit the clusterrepresentationMethod(e.g. distribution). In v3,SegmentConfig(representation=...)defaults to"mean"independently. If you relied on the implicit inheritance, pass the representation explicitly:SegmentConfig(n_segments=12, representation=Distribution(scope="global")) - Removed methods: The
reconstruct()method has been removed; use thereconstructedproperty onAggregationResultinstead - Renamed parameters: Parameters have been renamed for consistency:
| Old (v2) | New (v3) |
|---|---|
noTypicalPeriods |
n_clusters |
hoursPerPeriod |
period_duration |
resolution |
temporal_resolution |
clusterMethod |
cluster=ClusterConfig(method=...) |
representationMethod |
cluster=ClusterConfig(representation=...) |
segmentation + noSegments |
segments=SegmentConfig(n_segments=...) |
sameMean |
cluster=ClusterConfig(normalize_column_means=...) |
rescaleClusterPeriods |
preserve_column_means |
sortValues |
cluster=ClusterConfig(use_duration_curves=...) |
evalSumPeriods |
cluster=ClusterConfig(include_period_sums=...) |
weightDict |
weights (top-level parameter) |
addPeakMax/Min, etc. |
extremes=ExtremeConfig(max_value=..., ...) |
New Features¶
-
Modern functional API: New
tsam.aggregate()function returns anAggregationResultwith properties:cluster_representatives: DataFrame with aggregated typical periodscluster_assignments: Which cluster each original period belongs tocluster_weights: Occurrence count per clusteraccuracy:AccuracyMetricsobject with RMSE, MAE, and duration curve RMSEreconstructed: Reconstructed time series (cached property)residuals: Difference between original and reconstructedoriginal: Access to original input dataclustering:ClusteringResultfor serialization and transfer
-
Clustering transfer and serialization: New
ClusteringResultenables:- Save/load clustering via
to_json()/from_json() - Apply same clustering to different data via
apply() - Transfer clustering from one dataset to another (e.g., cluster on wind, apply to all columns)
- Save/load clustering via
-
Integrated plotting via
result.plotaccessor with Plotly (replaces matplotlib):result.plot.compare(): Compare original vs reconstructed (overlay, side-by-side, or duration curves)result.plot.residuals(): Visualize reconstruction errors (time series, histogram, by period, or by timestep)result.plot.cluster_representatives(): Plot typical periods with cluster weightsresult.plot.cluster_members(): All original periods per cluster with representative highlighted, interactive sliderresult.plot.cluster_weights(): Cluster weight distributionresult.plot.accuracy(): Accuracy metrics (RMSE, MAE, duration RMSE) per columnresult.plot.segment_durations(): Average segment durations (when using segmentation)
-
Hyperparameter tuning module
tsam.tuningwith:find_optimal_combination(): Find best n_clusters/n_segments combinationfind_pareto_front(): Compute Pareto front of accuracy vs. complexity- Support for parallel execution
- New parameters:
segment_representation,extremes,preserve_column_means,round_decimals,numerical_tolerance
-
Accuracy metrics:
AccuracyMetricsclass with.summaryproperty for convenient DataFrame output -
Utility functions:
tsam.unstack_to_periods()for reshaping time series for heatmap visualization -
DistributionandMinMaxMeanrepresentation objects forClusterConfigandSegmentConfig, providing a structured alternative to plain string representation names
Improvements¶
- Segment center preservation for better accuracy when using medoid/maxoid segment representation
- Consistent semantic naming across the entire codebase
- Better handling of extreme periods with
n_clustersedge cases - Lazy loading of optional modules (
plot,tuning) to reduce import time
Bug Fixes¶
These bugs existed in v2.3.9:
- Fixed rescaling with segmentation (was applying rescaling twice)
- Fixed
predictOriginalData()denormalization when usingsameMean=Truewith segmentation - Fixed segment label ordering bug:
AgglomerativeClusteringproduces arbitrary cluster labels, which causeddurationRepresentation()withdistributionPeriodWise=Falseto allocate the global distribution differently when transferring a clustering. Segment clusters are now relabelled to temporal order afterfit_predict(). - Fixed non-deterministic sorting in
durationRepresentation()across both code paths by usingkind="stable"andnp.round(mean, 10)beforeargsort, ensuring identical tie-breaking across platforms.
Result consistency¶
The stable sort fix guarantees cross-platform reproducibility but changes tie-breaking
compared to v2.3.9. Four distribution-related configurations (hierarchical_distribution,
hierarchical_distribution_minmax, distribution_global, distribution_minmax_global)
produce slightly different results, but will be consistent across systems from now on. All statistical properties are preserved. The remaining
23 configurations are bit-for-bit identical to v2.3.9. See the
migration guide for details.
Known Limitations¶
- Clustering transfer with 'replace' extreme method: The 'replace' extreme method
creates a hybrid cluster representation where some columns use the medoid values
and others use the extreme period values. This hybrid representation cannot be
perfectly reproduced during transfer via
ClusteringResult.apply(). Warnings are issued when saving (to_json()) or applying such a clustering. For exact transfer with extreme periods, use 'append' or 'new_cluster' extreme methods instead.
Performance¶
Multiple vectorization optimizations replace pandas loops with numpy array operations, providing 35--77x end-to-end speedups over v2.3.9 for most configurations.
Benchmarked across 27 configurations x 4 datasets against v2.3.9:
- Hierarchical methods on real-world data: 35--60x faster
- Distribution representation (cluster-wise): 35--55x faster
- Averaging: up to 77x faster
- Contiguous clustering: 50--54x faster
- Distribution representation (global scope): 7--16x faster
- Iterative methods (kmeans, kmedoids, kmaxoids): 1--6x faster (core solver dominates)
Key function-level optimizations:
predictOriginalData(): Vectorized indexing replaces per-period.unstack()loop (~290x function speedup).durationRepresentation(): Vectorized numpy 3D operations replace nested pandas loops (~8x function speedup)._rescaleClusterPeriods(): numpy 3D arrays replace pandas MultiIndex operations (~11x function speedup)._clusterSortedPeriods(): numpy 3D reshape + sort replaces per-column DataFrame sorting loop (~12x function speedup).
Testing¶
- Regression test suite: 296 old/new API equivalence tests + 148 golden-file tests comparing both APIs against baselines generated with tsam v2.3.9.
- Benchmark suite (
benchmarks/bench.py) for performance comparison across versions using pytest-benchmark.
Deprecations¶
-
TimeSeriesAggregation class: The legacy class-based API now emits a
LegacyAPIWarningwhen instantiated. It will be removed in a future version. Users should migrate to the newtsam.aggregate()function. -
unstackToPeriods function: Deprecated in favor of
tsam.unstack_to_periods(). -
HyperTunedAggregations class: The legacy hyperparameter tuning class in
tsam.hyperparametertuningis deprecated. Usetsam.tuning.find_optimal_combination()ortsam.tuning.find_pareto_front()instead. -
getNoPeriodsForDataReduction / getNoSegmentsForDataReduction: Helper functions deprecated along with
HyperTunedAggregations. -
To suppress warnings during migration:
Legacy API¶
The class-based API remains available for backward compatibility but is deprecated:
import tsam.timeseriesaggregation as tsam_legacy
aggregation = tsam_legacy.TimeSeriesAggregation(
raw,
noTypicalPeriods=8,
hoursPerPeriod=24,
clusterMethod='hierarchical',
)
typical_periods = aggregation.createTypicalPeriods()
2.3.9¶
- Improved time series aggregation speed with segmentation (issue #96)
- Fixed issue #99
2.3.8¶
- Enhanced time series aggregation speed with segmentation (issue #96)
2.3.7¶
- Added Python 3.13 support
- Updated GitHub Actions workflow (ubuntu-20.04 to ubuntu-22.04)
- Resolved invalid escape sequence error (issue #90)
2.3.6¶
- Migrated from
setup.pytopyproject.toml - Changed project layout from flat to source structure
- Updated installation documentation
- Fixed deprecation and future warnings (issue #91)
2.3.5¶
- Re-release of v2.3.4 to fix GitHub/PyPI synchronization
2.3.4¶
- Extended reporting for time series tolerance exceedances
- Added option to silence tolerance warnings (default threshold: 1e-13)
2.3.3¶
- Dropped support for Python versions below 3.9
- Fixed deprecation warnings
2.3.2¶
- Limited pandas version to below 3.0
- Silenced deprecation warnings
2.3.1¶
- Accelerated rescale cluster periods functionality
- Updated documentation with autodeployment features
2.3.0¶
- Fixed deprecated pandas functions
- Corrected distribution representation sum calculations
- Added segment representation capability
- Extended default example
- Switched CI infrastructure from Travis to GitHub workflows
2.2.2¶
- Fixed Hypertuning class
- Adjusted the default MILP solver
- Reworked documentation
2.1.0¶
- Added hyperparameter tuning meta class for identifying optimal time series aggregation parameters
2.0.1¶
- Changed dependency of scikit-learn to make tsam conda-forge compatible
2.0.0¶
- A new comprehensive structure that allows for free cross-combination of clustering algorithms and cluster representations (e.g., centroids or medoids)
- A novel cluster representation method that precisely replicates the original time series value distribution based on Hoffmann, Kotzur and Stolten (2021)
- Maxoids as representation algorithm which represents time series by outliers only based on Sifa and Bauckhage (2017): "Online k-Maxoids clustering"
- K-medoids contiguity: An algorithm based on Oehrlein and Hauner (2017) that accounts for contiguity constraints
1.1.2¶
- Added first version of the k-medoid contiguity algorithm
1.1.1¶
- Significantly increased test coverage
- Separation between clustering and representation (e.g., for Ward's hierarchical clustering, the representation by medoids or centroids can now be freely chosen)
1.1.0¶
- Segmentation (clustering of adjacent time steps) according to Pineda et al. (2018)
- k-MILP: Extension of MILP-based k-medoids clustering for automatic identification of extreme periods according to Zatti et al. (2019)
- Option to dynamically choose whether clusters should be represented by their centroid or medoid