base¶
Classes
Interface for pipeline. |
|
|
Base class for pipeline. |
|
Enum for different cross-validation modes. |
|
Container to hold the description of the fold mask. |
|
Group for parallel fold processing. |
|
Dummy metric that is created only for implementation of BasePipeline._forecast_prediction_interval. |
- class AbstractPipeline[source]¶
Interface for pipeline.
- abstract backtest(ts: etna.datasets.tsdataset.TSDataset, metrics: List[etna.metrics.base.Metric], n_folds: Union[int, List[etna.pipeline.base.FoldMask]] = 5, mode: Optional[str] = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: Union[bool, int] = True, stride: Optional[int] = None, joblib_params: Optional[Dict[str, Any]] = None, forecast_params: Optional[Dict[str, Any]] = None) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame] [source]¶
Run backtest with the pipeline.
If
refit != True
and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset to fit models in backtest
metrics (List[etna.metrics.base.Metric]) – List of metrics to compute for each fold
n_folds (Union[int, List[etna.pipeline.base.FoldMask]]) – Number of folds or the list of fold masks
mode (Optional[str]) – Train generation policy: ‘expand’ or ‘constant’. Works only if
n_folds
is integer. By default, is set to ‘expand’.aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise
n_jobs (int) – Number of jobs to run in parallel
refit (Union[bool, int]) –
Determines how often pipeline should be retrained during iteration over folds.
If
True
: pipeline is retrained on each fold.If
False
: pipeline is trained only on the first fold.If
value: int
: pipeline is trained everyvalue
folds starting from the first.
stride (Optional[int]) – Number of points between folds. Works only if
n_folds
is integer. By default, is set tohorizon
.joblib_params (Optional[Dict[str, Any]]) – Additional parameters for
joblib.Parallel
forecast_params (Optional[Dict[str, Any]]) – Additional parameters for
forecast()
- Returns
metrics_df, forecast_df, fold_info_df – Metrics dataframe, forecast dataframe and dataframe with information about folds
- Return type
Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]
- abstract fit(ts: etna.datasets.tsdataset.TSDataset) etna.pipeline.base.AbstractPipeline [source]¶
Fit the Pipeline.
- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset with timeseries data
- Returns
Fitted Pipeline instance
- Return type
- abstract forecast(ts: Optional[etna.datasets.tsdataset.TSDataset] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) etna.datasets.tsdataset.TSDataset [source]¶
Make a forecast of the next points of a dataset.
The result of forecasting starts from the last point of
ts
, not including it.- Parameters
ts (Optional[etna.datasets.tsdataset.TSDataset]) – Dataset to forecast. If not given, dataset given during :py:meth:
fit
is used.prediction_interval (bool) – If True returns prediction interval for forecast
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval
n_folds (int) – Number of folds to use in the backtest for prediction interval estimation
return_components (bool) – If True additionally returns forecast components
- Returns
Dataset with predictions
- Return type
- abstract classmethod load(path: pathlib.Path) typing_extensions.Self ¶
Load an object.
- Parameters
path (pathlib.Path) – Path to load object from.
- Return type
typing_extensions.Self
- abstract params_to_tune() Dict[str, etna.distributions.distributions.BaseDistribution] [source]¶
Get hyperparameter grid to tune.
- Returns
Grid with hyperparameters.
- Return type
Dict[str, etna.distributions.distributions.BaseDistribution]
- abstract predict(ts: etna.datasets.tsdataset.TSDataset, start_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, end_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) etna.datasets.tsdataset.TSDataset [source]¶
Make in-sample predictions on dataset in a given range.
Currently, in situation when segments start with different timestamps we only guarantee to work with
start_timestamp
>= beginning of all segments.- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset to make predictions on.
start_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – First timestamp of prediction range to return, should be >= than first timestamp in
ts
; expected that beginning of each segment <=start_timestamp
; if isn’t set the first timestamp where each segment began is taken.end_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – Last timestamp of prediction range to return; if isn’t set the last timestamp of
ts
is taken. Expected that value is less or equal to the last timestamp ints
.prediction_interval (bool) – If True returns prediction interval for forecast.
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.
return_components (bool) – If True additionally returns forecast components
- Returns
Dataset with predictions in
[start_timestamp, end_timestamp]
range.- Raises
ValueError: – Value of
end_timestamp
is less thanstart_timestamp
.ValueError: – Value of
start_timestamp
goes before point where each segment started.ValueError: – Value of
end_timestamp
goes after the last timestamp.
- Return type
- abstract save(path: pathlib.Path)¶
Save the object.
- Parameters
path (pathlib.Path) – Path to save object to.
- class BasePipeline(horizon: int)[source]¶
Base class for pipeline.
- Parameters
horizon (int) –
- backtest(ts: etna.datasets.tsdataset.TSDataset, metrics: List[etna.metrics.base.Metric], n_folds: Union[int, List[etna.pipeline.base.FoldMask]] = 5, mode: Optional[str] = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: Union[bool, int] = True, stride: Optional[int] = None, joblib_params: Optional[Dict[str, Any]] = None, forecast_params: Optional[Dict[str, Any]] = None) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame] [source]¶
Run backtest with the pipeline.
If
refit != True
and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset to fit models in backtest
metrics (List[etna.metrics.base.Metric]) – List of metrics to compute for each fold
n_folds (Union[int, List[etna.pipeline.base.FoldMask]]) – Number of folds or the list of fold masks
mode (Optional[str]) – Train generation policy: ‘expand’ or ‘constant’. Works only if
n_folds
is integer. By default, is set to ‘expand’.aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise
n_jobs (int) – Number of jobs to run in parallel
refit (Union[bool, int]) –
Determines how often pipeline should be retrained during iteration over folds.
If
True
: pipeline is retrained on each fold.If
False
: pipeline is trained only on the first fold.If
value: int
: pipeline is trained everyvalue
folds starting from the first.
stride (Optional[int]) – Number of points between folds. Works only if
n_folds
is integer. By default, is set tohorizon
.joblib_params (Optional[Dict[str, Any]]) – Additional parameters for
joblib.Parallel
forecast_params (Optional[Dict[str, Any]]) – Additional parameters for
forecast()
- Returns
metrics_df, forecast_df, fold_info_df – Metrics dataframe, forecast dataframe and dataframe with information about folds
- Return type
Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]
- Raises
ValueError: – If
mode
is set whenn_folds
areList[FoldMask]
.ValueError: – If
stride
is set whenn_folds
areList[FoldMask]
.
- abstract fit(ts: etna.datasets.tsdataset.TSDataset) etna.pipeline.base.AbstractPipeline ¶
Fit the Pipeline.
- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset with timeseries data
- Returns
Fitted Pipeline instance
- Return type
- forecast(ts: Optional[etna.datasets.tsdataset.TSDataset] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) etna.datasets.tsdataset.TSDataset [source]¶
Make a forecast of the next points of a dataset.
The result of forecasting starts from the last point of
ts
, not including it.- Parameters
ts (Optional[etna.datasets.tsdataset.TSDataset]) – Dataset to forecast. If not given, dataset given during :py:meth:
fit
is used.prediction_interval (bool) – If True returns prediction interval for forecast
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval
n_folds (int) – Number of folds to use in the backtest for prediction interval estimation
return_components (bool) – If True additionally returns forecast components
- Returns
Dataset with predictions
- Raises
NotImplementedError: – Adding target components is not currently implemented
- Return type
- abstract classmethod load(path: pathlib.Path) typing_extensions.Self ¶
Load an object.
- Parameters
path (pathlib.Path) – Path to load object from.
- Return type
typing_extensions.Self
- abstract params_to_tune() Dict[str, etna.distributions.distributions.BaseDistribution] ¶
Get hyperparameter grid to tune.
- Returns
Grid with hyperparameters.
- Return type
Dict[str, etna.distributions.distributions.BaseDistribution]
- predict(ts: etna.datasets.tsdataset.TSDataset, start_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, end_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) etna.datasets.tsdataset.TSDataset [source]¶
Make in-sample predictions on dataset in a given range.
Currently, in situation when segments start with different timestamps we only guarantee to work with
start_timestamp
>= beginning of all segments.- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset to make predictions on.
start_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – First timestamp of prediction range to return, should be >= than first timestamp in
ts
; expected that beginning of each segment <=start_timestamp
; if isn’t set the first timestamp where each segment began is taken.end_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – Last timestamp of prediction range to return; if isn’t set the last timestamp of
ts
is taken. Expected that value is less or equal to the last timestamp ints
.prediction_interval (bool) – If True returns prediction interval for forecast.
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.
return_components (bool) – If True additionally returns forecast components
- Returns
Dataset with predictions in
[start_timestamp, end_timestamp]
range.- Raises
ValueError: – Value of
end_timestamp
is less thanstart_timestamp
.ValueError: – Value of
start_timestamp
goes before point where each segment started.ValueError: – Value of
end_timestamp
goes after the last timestamp.NotImplementedError: – Adding target components is not currently implemented
- Return type
- abstract save(path: pathlib.Path)¶
Save the object.
- Parameters
path (pathlib.Path) – Path to save object to.
- set_params(**params: dict) etna.core.mixins.TMixin ¶
Return new object instance with modified parameters.
Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a
model
in aPipeline
.Nested parameters are expected to be in a
<component_1>.<...>.<parameter>
form, where components are separated by a dot.- Parameters
**params – Estimator parameters
self (etna.core.mixins.TMixin) –
params (dict) –
- Returns
New instance with changed parameters
- Return type
etna.core.mixins.TMixin
Examples
>>> from etna.pipeline import Pipeline >>> from etna.models import NaiveModel >>> from etna.transforms import AddConstTransform >>> model = model=NaiveModel(lag=1) >>> transforms = [AddConstTransform(in_column="target", value=1)] >>> pipeline = Pipeline(model, transforms=transforms, horizon=3) >>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2}) Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
- to_dict()¶
Collect all information about etna object in dict.
- class FoldMask(first_train_timestamp: Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]], last_train_timestamp: Union[str, pandas._libs.tslibs.timestamps.Timestamp], target_timestamps: List[Union[str, pandas._libs.tslibs.timestamps.Timestamp]])[source]¶
Container to hold the description of the fold mask.
Fold masks are expected to be used for backtest strategy customization.
Init FoldMask.
- Parameters
first_train_timestamp (Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]]) – First train timestamp, the first timestamp in the dataset if None is passed
last_train_timestamp (Union[str, pandas._libs.tslibs.timestamps.Timestamp]) – Last train timestamp
target_timestamps (List[Union[str, pandas._libs.tslibs.timestamps.Timestamp]]) – List of target timestamps
- set_params(**params: dict) etna.core.mixins.TMixin ¶
Return new object instance with modified parameters.
Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a
model
in aPipeline
.Nested parameters are expected to be in a
<component_1>.<...>.<parameter>
form, where components are separated by a dot.- Parameters
**params – Estimator parameters
self (etna.core.mixins.TMixin) –
params (dict) –
- Returns
New instance with changed parameters
- Return type
etna.core.mixins.TMixin
Examples
>>> from etna.pipeline import Pipeline >>> from etna.models import NaiveModel >>> from etna.transforms import AddConstTransform >>> model = model=NaiveModel(lag=1) >>> transforms = [AddConstTransform(in_column="target", value=1)] >>> pipeline = Pipeline(model, transforms=transforms, horizon=3) >>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2}) Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
- to_dict()¶
Collect all information about etna object in dict.
- validate_on_dataset(ts: etna.datasets.tsdataset.TSDataset, horizon: int)[source]¶
Validate fold mask on the dataset with specified horizon.
- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset to validate on
horizon (int) – Forecasting horizon
- class FoldParallelGroup(_typename, _fields=None, /, **kwargs)[source]¶
Group for parallel fold processing.
- clear() None. Remove all items from D. ¶
- copy() a shallow copy of D ¶
- fromkeys(value=None, /)¶
Create a new dictionary with keys from iterable and values set to value.
- get(key, default=None, /)¶
Return the value for key if key is in the dictionary, else default.
- items() a set-like object providing a view on D's items ¶
- keys() a set-like object providing a view on D's keys ¶
- pop(k[, d]) v, remove specified key and return the corresponding value. ¶
If key is not found, d is returned if given, otherwise KeyError is raised
- popitem()¶
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, default=None, /)¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) None. Update D from dict/iterable E and F. ¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() an object providing a view on D's values ¶