base

Classes

AbstractPipeline()

Interface for pipeline.

BasePipeline(horizon)

Base class for pipeline.

CrossValidationMode(value)

Enum for different cross-validation modes.

FoldMask(first_train_timestamp, ...)

Container to hold the description of the fold mask.

FoldParallelGroup(_typename[, _fields])

Group for parallel fold processing.

_DummyMetric([mode])

Dummy metric that is created only for implementation of BasePipeline._forecast_prediction_interval.

class AbstractPipeline[source]

Interface for pipeline.

abstract backtest(ts: etna.datasets.tsdataset.TSDataset, metrics: List[etna.metrics.base.Metric], n_folds: Union[int, List[etna.pipeline.base.FoldMask]] = 5, mode: Optional[str] = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: Union[bool, int] = True, stride: Optional[int] = None, joblib_params: Optional[Dict[str, Any]] = None, forecast_params: Optional[Dict[str, Any]] = None) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

Run backtest with the pipeline.

If refit != True and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.

Parameters
  • ts (etna.datasets.tsdataset.TSDataset) – Dataset to fit models in backtest

  • metrics (List[etna.metrics.base.Metric]) – List of metrics to compute for each fold

  • n_folds (Union[int, List[etna.pipeline.base.FoldMask]]) – Number of folds or the list of fold masks

  • mode (Optional[str]) – Train generation policy: ‘expand’ or ‘constant’. Works only if n_folds is integer. By default, is set to ‘expand’.

  • aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise

  • n_jobs (int) – Number of jobs to run in parallel

  • refit (Union[bool, int]) –

    Determines how often pipeline should be retrained during iteration over folds.

    • If True: pipeline is retrained on each fold.

    • If False: pipeline is trained only on the first fold.

    • If value: int: pipeline is trained every value folds starting from the first.

  • stride (Optional[int]) – Number of points between folds. Works only if n_folds is integer. By default, is set to horizon.

  • joblib_params (Optional[Dict[str, Any]]) – Additional parameters for joblib.Parallel

  • forecast_params (Optional[Dict[str, Any]]) – Additional parameters for forecast()

Returns

metrics_df, forecast_df, fold_info_df – Metrics dataframe, forecast dataframe and dataframe with information about folds

Return type

Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

abstract fit(ts: etna.datasets.tsdataset.TSDataset) etna.pipeline.base.AbstractPipeline[source]

Fit the Pipeline.

Parameters

ts (etna.datasets.tsdataset.TSDataset) – Dataset with timeseries data

Returns

Fitted Pipeline instance

Return type

etna.pipeline.base.AbstractPipeline

abstract forecast(ts: Optional[etna.datasets.tsdataset.TSDataset] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) etna.datasets.tsdataset.TSDataset[source]

Make a forecast of the next points of a dataset.

The result of forecasting starts from the last point of ts, not including it.

Parameters
  • ts (Optional[etna.datasets.tsdataset.TSDataset]) – Dataset to forecast. If not given, dataset given during :py:meth:fit is used.

  • prediction_interval (bool) – If True returns prediction interval for forecast

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval

  • n_folds (int) – Number of folds to use in the backtest for prediction interval estimation

  • return_components (bool) – If True additionally returns forecast components

Returns

Dataset with predictions

Return type

etna.datasets.tsdataset.TSDataset

abstract classmethod load(path: pathlib.Path) typing_extensions.Self

Load an object.

Parameters

path (pathlib.Path) – Path to load object from.

Return type

typing_extensions.Self

abstract params_to_tune() Dict[str, etna.distributions.distributions.BaseDistribution][source]

Get hyperparameter grid to tune.

Returns

Grid with hyperparameters.

Return type

Dict[str, etna.distributions.distributions.BaseDistribution]

abstract predict(ts: etna.datasets.tsdataset.TSDataset, start_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, end_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) etna.datasets.tsdataset.TSDataset[source]

Make in-sample predictions on dataset in a given range.

Currently, in situation when segments start with different timestamps we only guarantee to work with start_timestamp >= beginning of all segments.

Parameters
  • ts (etna.datasets.tsdataset.TSDataset) – Dataset to make predictions on.

  • start_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – First timestamp of prediction range to return, should be >= than first timestamp in ts; expected that beginning of each segment <= start_timestamp; if isn’t set the first timestamp where each segment began is taken.

  • end_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – Last timestamp of prediction range to return; if isn’t set the last timestamp of ts is taken. Expected that value is less or equal to the last timestamp in ts.

  • prediction_interval (bool) – If True returns prediction interval for forecast.

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.

  • return_components (bool) – If True additionally returns forecast components

Returns

Dataset with predictions in [start_timestamp, end_timestamp] range.

Raises
  • ValueError: – Value of end_timestamp is less than start_timestamp.

  • ValueError: – Value of start_timestamp goes before point where each segment started.

  • ValueError: – Value of end_timestamp goes after the last timestamp.

Return type

etna.datasets.tsdataset.TSDataset

abstract save(path: pathlib.Path)

Save the object.

Parameters

path (pathlib.Path) – Path to save object to.

class BasePipeline(horizon: int)[source]

Base class for pipeline.

Parameters

horizon (int) –

backtest(ts: etna.datasets.tsdataset.TSDataset, metrics: List[etna.metrics.base.Metric], n_folds: Union[int, List[etna.pipeline.base.FoldMask]] = 5, mode: Optional[str] = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: Union[bool, int] = True, stride: Optional[int] = None, joblib_params: Optional[Dict[str, Any]] = None, forecast_params: Optional[Dict[str, Any]] = None) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

Run backtest with the pipeline.

If refit != True and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.

Parameters
  • ts (etna.datasets.tsdataset.TSDataset) – Dataset to fit models in backtest

  • metrics (List[etna.metrics.base.Metric]) – List of metrics to compute for each fold

  • n_folds (Union[int, List[etna.pipeline.base.FoldMask]]) – Number of folds or the list of fold masks

  • mode (Optional[str]) – Train generation policy: ‘expand’ or ‘constant’. Works only if n_folds is integer. By default, is set to ‘expand’.

  • aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise

  • n_jobs (int) – Number of jobs to run in parallel

  • refit (Union[bool, int]) –

    Determines how often pipeline should be retrained during iteration over folds.

    • If True: pipeline is retrained on each fold.

    • If False: pipeline is trained only on the first fold.

    • If value: int: pipeline is trained every value folds starting from the first.

  • stride (Optional[int]) – Number of points between folds. Works only if n_folds is integer. By default, is set to horizon.

  • joblib_params (Optional[Dict[str, Any]]) – Additional parameters for joblib.Parallel

  • forecast_params (Optional[Dict[str, Any]]) – Additional parameters for forecast()

Returns

metrics_df, forecast_df, fold_info_df – Metrics dataframe, forecast dataframe and dataframe with information about folds

Return type

Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

Raises
  • ValueError: – If mode is set when n_folds are List[FoldMask].

  • ValueError: – If stride is set when n_folds are List[FoldMask].

abstract fit(ts: etna.datasets.tsdataset.TSDataset) etna.pipeline.base.AbstractPipeline

Fit the Pipeline.

Parameters

ts (etna.datasets.tsdataset.TSDataset) – Dataset with timeseries data

Returns

Fitted Pipeline instance

Return type

etna.pipeline.base.AbstractPipeline

forecast(ts: Optional[etna.datasets.tsdataset.TSDataset] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) etna.datasets.tsdataset.TSDataset[source]

Make a forecast of the next points of a dataset.

The result of forecasting starts from the last point of ts, not including it.

Parameters
  • ts (Optional[etna.datasets.tsdataset.TSDataset]) – Dataset to forecast. If not given, dataset given during :py:meth:fit is used.

  • prediction_interval (bool) – If True returns prediction interval for forecast

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval

  • n_folds (int) – Number of folds to use in the backtest for prediction interval estimation

  • return_components (bool) – If True additionally returns forecast components

Returns

Dataset with predictions

Raises

NotImplementedError: – Adding target components is not currently implemented

Return type

etna.datasets.tsdataset.TSDataset

abstract classmethod load(path: pathlib.Path) typing_extensions.Self

Load an object.

Parameters

path (pathlib.Path) – Path to load object from.

Return type

typing_extensions.Self

abstract params_to_tune() Dict[str, etna.distributions.distributions.BaseDistribution]

Get hyperparameter grid to tune.

Returns

Grid with hyperparameters.

Return type

Dict[str, etna.distributions.distributions.BaseDistribution]

predict(ts: etna.datasets.tsdataset.TSDataset, start_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, end_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) etna.datasets.tsdataset.TSDataset[source]

Make in-sample predictions on dataset in a given range.

Currently, in situation when segments start with different timestamps we only guarantee to work with start_timestamp >= beginning of all segments.

Parameters
  • ts (etna.datasets.tsdataset.TSDataset) – Dataset to make predictions on.

  • start_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – First timestamp of prediction range to return, should be >= than first timestamp in ts; expected that beginning of each segment <= start_timestamp; if isn’t set the first timestamp where each segment began is taken.

  • end_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – Last timestamp of prediction range to return; if isn’t set the last timestamp of ts is taken. Expected that value is less or equal to the last timestamp in ts.

  • prediction_interval (bool) – If True returns prediction interval for forecast.

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.

  • return_components (bool) – If True additionally returns forecast components

Returns

Dataset with predictions in [start_timestamp, end_timestamp] range.

Raises
  • ValueError: – Value of end_timestamp is less than start_timestamp.

  • ValueError: – Value of start_timestamp goes before point where each segment started.

  • ValueError: – Value of end_timestamp goes after the last timestamp.

  • NotImplementedError: – Adding target components is not currently implemented

Return type

etna.datasets.tsdataset.TSDataset

abstract save(path: pathlib.Path)

Save the object.

Parameters

path (pathlib.Path) – Path to save object to.

set_params(**params: dict) etna.core.mixins.TMixin

Return new object instance with modified parameters.

Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a model in a Pipeline.

Nested parameters are expected to be in a <component_1>.<...>.<parameter> form, where components are separated by a dot.

Parameters
  • **params – Estimator parameters

  • self (etna.core.mixins.TMixin) –

  • params (dict) –

Returns

New instance with changed parameters

Return type

etna.core.mixins.TMixin

Examples

>>> from etna.pipeline import Pipeline
>>> from etna.models import NaiveModel
>>> from etna.transforms import AddConstTransform
>>> model = model=NaiveModel(lag=1)
>>> transforms = [AddConstTransform(in_column="target", value=1)]
>>> pipeline = Pipeline(model, transforms=transforms, horizon=3)
>>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2})
Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
to_dict()

Collect all information about etna object in dict.

class CrossValidationMode(value)[source]

Enum for different cross-validation modes.

class FoldMask(first_train_timestamp: Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]], last_train_timestamp: Union[str, pandas._libs.tslibs.timestamps.Timestamp], target_timestamps: List[Union[str, pandas._libs.tslibs.timestamps.Timestamp]])[source]

Container to hold the description of the fold mask.

Fold masks are expected to be used for backtest strategy customization.

Init FoldMask.

Parameters
  • first_train_timestamp (Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]]) – First train timestamp, the first timestamp in the dataset if None is passed

  • last_train_timestamp (Union[str, pandas._libs.tslibs.timestamps.Timestamp]) – Last train timestamp

  • target_timestamps (List[Union[str, pandas._libs.tslibs.timestamps.Timestamp]]) – List of target timestamps

set_params(**params: dict) etna.core.mixins.TMixin

Return new object instance with modified parameters.

Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a model in a Pipeline.

Nested parameters are expected to be in a <component_1>.<...>.<parameter> form, where components are separated by a dot.

Parameters
  • **params – Estimator parameters

  • self (etna.core.mixins.TMixin) –

  • params (dict) –

Returns

New instance with changed parameters

Return type

etna.core.mixins.TMixin

Examples

>>> from etna.pipeline import Pipeline
>>> from etna.models import NaiveModel
>>> from etna.transforms import AddConstTransform
>>> model = model=NaiveModel(lag=1)
>>> transforms = [AddConstTransform(in_column="target", value=1)]
>>> pipeline = Pipeline(model, transforms=transforms, horizon=3)
>>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2})
Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
to_dict()

Collect all information about etna object in dict.

validate_on_dataset(ts: etna.datasets.tsdataset.TSDataset, horizon: int)[source]

Validate fold mask on the dataset with specified horizon.

Parameters
class FoldParallelGroup(_typename, _fields=None, /, **kwargs)[source]

Group for parallel fold processing.

clear() None.  Remove all items from D.
copy() a shallow copy of D
fromkeys(value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
pop(k[, d]) v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values