stacking_ensemble¶
Classes
|
StackingEnsemble is a pipeline that forecast future using the metamodel to combine the forecasts of the base models. |
- class StackingEnsemble(pipelines: List[etna.pipeline.base.BasePipeline], final_model: Optional[sklearn.base.RegressorMixin] = None, n_folds: int = 3, features_to_use: Union[None, Literal['all'], List[str]] = None, n_jobs: int = 1, joblib_params: Optional[Dict[str, Any]] = None)[source]¶
StackingEnsemble is a pipeline that forecast future using the metamodel to combine the forecasts of the base models.
Examples
>>> from etna.datasets import generate_ar_df >>> from etna.datasets import TSDataset >>> from etna.ensembles import VotingEnsemble >>> from etna.models import NaiveModel >>> from etna.models import MovingAverageModel >>> from etna.pipeline import Pipeline >>> import pandas as pd >>> pd.options.display.float_format = '{:,.2f}'.format >>> df = generate_ar_df(periods=100, start_time="2021-06-01", ar_coef=[0.8], n_segments=3) >>> df_ts_format = TSDataset.to_dataset(df) >>> ts = TSDataset(df_ts_format, "D") >>> ma_pipeline = Pipeline(model=MovingAverageModel(window=5), transforms=[], horizon=7) >>> naive_pipeline = Pipeline(model=NaiveModel(lag=10), transforms=[], horizon=7) >>> ensemble = StackingEnsemble(pipelines=[ma_pipeline, naive_pipeline]) >>> _ = ensemble.fit(ts=ts) >>> forecast = ensemble.forecast() >>> forecast[:,:,"target"] segment segment_0 segment_1 segment_2 feature target target target timestamp 2021-09-09 0.70 1.47 0.20 2021-09-10 0.62 1.53 0.26 2021-09-11 0.50 1.78 0.36 2021-09-12 0.37 1.88 0.21 2021-09-13 0.46 1.87 0.25 2021-09-14 0.44 1.49 0.21 2021-09-15 0.36 1.56 0.30
Init StackingEnsemble.
- Parameters
pipelines (List[etna.pipeline.base.BasePipeline]) – List of pipelines that should be used in ensemble.
final_model (Optional[sklearn.base.RegressorMixin]) – Regression model with fit/predict interface which will be used to combine the base estimators.
n_folds (int) – Number of folds to use in the backtest. Backtest is not used for model evaluation but for prediction.
features_to_use (Union[None, Literal['all'], typing.List[str]]) – Features except the forecasts of the base models to use in the
final_model
.n_jobs (int) – Number of jobs to run in parallel.
joblib_params (Optional[Dict[str, Any]]) – Additional parameters for
joblib.Parallel
.
- Raises
ValueError: – If the number of the pipelines is less than 2 or pipelines have different horizons.
- backtest(ts: etna.datasets.tsdataset.TSDataset, metrics: List[etna.metrics.base.Metric], n_folds: Union[int, List[etna.pipeline.base.FoldMask]] = 5, mode: Optional[str] = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: Union[bool, int] = True, stride: Optional[int] = None, joblib_params: Optional[Dict[str, Any]] = None, forecast_params: Optional[Dict[str, Any]] = None) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame] ¶
Run backtest with the pipeline.
If
refit != True
and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset to fit models in backtest
metrics (List[etna.metrics.base.Metric]) – List of metrics to compute for each fold
n_folds (Union[int, List[etna.pipeline.base.FoldMask]]) – Number of folds or the list of fold masks
mode (Optional[str]) – Train generation policy: ‘expand’ or ‘constant’. Works only if
n_folds
is integer. By default, is set to ‘expand’.aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise
n_jobs (int) – Number of jobs to run in parallel
refit (Union[bool, int]) –
Determines how often pipeline should be retrained during iteration over folds.
If
True
: pipeline is retrained on each fold.If
False
: pipeline is trained only on the first fold.If
value: int
: pipeline is trained everyvalue
folds starting from the first.
stride (Optional[int]) – Number of points between folds. Works only if
n_folds
is integer. By default, is set tohorizon
.joblib_params (Optional[Dict[str, Any]]) – Additional parameters for
joblib.Parallel
forecast_params (Optional[Dict[str, Any]]) – Additional parameters for
forecast()
- Returns
metrics_df, forecast_df, fold_info_df – Metrics dataframe, forecast dataframe and dataframe with information about folds
- Return type
Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]
- Raises
ValueError: – If
mode
is set whenn_folds
areList[FoldMask]
.ValueError: – If
stride
is set whenn_folds
areList[FoldMask]
.
- fit(ts: etna.datasets.tsdataset.TSDataset) etna.ensembles.stacking_ensemble.StackingEnsemble [source]¶
Fit the ensemble.
- Parameters
ts (etna.datasets.tsdataset.TSDataset) – TSDataset to fit ensemble.
- Returns
Fitted ensemble.
- Return type
self
- forecast(ts: Optional[etna.datasets.tsdataset.TSDataset] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) etna.datasets.tsdataset.TSDataset ¶
Make a forecast of the next points of a dataset.
The result of forecasting starts from the last point of
ts
, not including it.- Parameters
ts (Optional[etna.datasets.tsdataset.TSDataset]) – Dataset to forecast. If not given, dataset given during :py:meth:
fit
is used.prediction_interval (bool) – If True returns prediction interval for forecast
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval
n_folds (int) – Number of folds to use in the backtest for prediction interval estimation
return_components (bool) – If True additionally returns forecast components
- Returns
Dataset with predictions
- Raises
NotImplementedError: – Adding target components is not currently implemented
- Return type
- classmethod load(path: pathlib.Path, ts: Optional[etna.datasets.tsdataset.TSDataset] = None) typing_extensions.Self ¶
Load an object.
Warning
This method uses
dill
module which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.- Parameters
path (pathlib.Path) – Path to load object from.
ts (Optional[etna.datasets.tsdataset.TSDataset]) – TSDataset to set into loaded pipeline.
- Returns
Loaded object.
- Return type
typing_extensions.Self
- params_to_tune() Dict[str, etna.distributions.distributions.BaseDistribution] [source]¶
Get hyperparameter grid to tune.
Not implemented for this class.
- Returns
Grid with hyperparameters.
- Return type
Dict[str, etna.distributions.distributions.BaseDistribution]
- predict(ts: etna.datasets.tsdataset.TSDataset, start_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, end_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) etna.datasets.tsdataset.TSDataset ¶
Make in-sample predictions on dataset in a given range.
Currently, in situation when segments start with different timestamps we only guarantee to work with
start_timestamp
>= beginning of all segments.- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset to make predictions on.
start_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – First timestamp of prediction range to return, should be >= than first timestamp in
ts
; expected that beginning of each segment <=start_timestamp
; if isn’t set the first timestamp where each segment began is taken.end_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – Last timestamp of prediction range to return; if isn’t set the last timestamp of
ts
is taken. Expected that value is less or equal to the last timestamp ints
.prediction_interval (bool) – If True returns prediction interval for forecast.
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.
return_components (bool) – If True additionally returns forecast components
- Returns
Dataset with predictions in
[start_timestamp, end_timestamp]
range.- Raises
ValueError: – Value of
end_timestamp
is less thanstart_timestamp
.ValueError: – Value of
start_timestamp
goes before point where each segment started.ValueError: – Value of
end_timestamp
goes after the last timestamp.NotImplementedError: – Adding target components is not currently implemented
- Return type
- save(path: pathlib.Path)¶
Save the object.
- Parameters
path (pathlib.Path) – Path to save object to.
- set_params(**params: dict) etna.core.mixins.TMixin ¶
Return new object instance with modified parameters.
Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a
model
in aPipeline
.Nested parameters are expected to be in a
<component_1>.<...>.<parameter>
form, where components are separated by a dot.- Parameters
**params – Estimator parameters
self (etna.core.mixins.TMixin) –
params (dict) –
- Returns
New instance with changed parameters
- Return type
etna.core.mixins.TMixin
Examples
>>> from etna.pipeline import Pipeline >>> from etna.models import NaiveModel >>> from etna.transforms import AddConstTransform >>> model = model=NaiveModel(lag=1) >>> transforms = [AddConstTransform(in_column="target", value=1)] >>> pipeline = Pipeline(model, transforms=transforms, horizon=3) >>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2}) Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
- to_dict()¶
Collect all information about etna object in dict.