plots¶
Functions
|
Calculate cross correlation between arrays. |
|
Autocorrelation and partial autocorrelation plot for multiple timeseries. |
|
Cross-correlation plot between multiple timeseries. |
|
Distribution of z-values grouped by segments and time frequency. |
|
Plot clusters [with centroids]. |
|
Plot pairwise correlation heatmap for selected segments. |
|
Plot holidays for segments. |
|
Plot the result of imputation by a given imputer. |
|
Plot the periodogram using |
- acf_plot(ts: TSDataset, n_segments: int = 10, lags: int = 21, partial: bool = False, columns_num: int = 2, segments: Optional[List[str]] = None, figsize: Tuple[int, int] = (10, 5))[source]¶
Autocorrelation and partial autocorrelation plot for multiple timeseries.
Notes
Definition of autocorrelation.
Definition of partial autocorrelation.
If
partial=False
function works with NaNs at any place of the time-series.if
partial=True
function works only with NaNs at the edges of the time-series and fails if there are NaNs inside it.
- Parameters
ts (TSDataset) – TSDataset with timeseries data
n_segments (int) – number of random segments to plot
lags (int) – number of timeseries shifts for cross-correlation
partial (bool) – plot autocorrelation or partial autocorrelation
columns_num (int) – number of columns in subplots
segments (Optional[List[str]]) – segments to plot
figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches
- Raises
ValueError: – If partial=True and there is a NaN in the middle of the time series
- cross_corr_plot(ts: TSDataset, n_segments: int = 10, maxlags: int = 21, segments: Optional[List[str]] = None, columns_num: int = 2, figsize: Tuple[int, int] = (10, 5))[source]¶
Cross-correlation plot between multiple timeseries.
- Parameters
ts (TSDataset) – TSDataset with timeseries data
n_segments (int) – number of random segments to plot, ignored if parameter
segments
is setmaxlags (int) – number of timeseries shifts for cross-correlation, should be >=1 and <= len(timeseries)
segments (Optional[List[str]]) – segments to plot
columns_num (int) – number of columns in subplots
figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches
- Raises
ValueError: – parameter
maxlags
doesn’t satisfy constraints
- distribution_plot(ts: TSDataset, n_segments: int = 10, segments: Optional[List[str]] = None, shift: int = 30, window: int = 30, freq: str = '1M', n_rows: int = 10, figsize: Tuple[int, int] = (10, 5))[source]¶
Distribution of z-values grouped by segments and time frequency.
Mean is calculated by the windows:
\[mean_{i} = \sum_{j=i-\text{shift}}^{i-\text{shift}+\text{window}} \frac{x_{j}}{\text{window}}\]The same is applied to standard deviation.
- Parameters
ts (TSDataset) – dataset with timeseries data
n_segments (int) – number of random segments to plot
segments (Optional[List[str]]) – segments to plot
shift (int) – number of timeseries shifts for statistics calc
window (int) – number of points for statistics calc
freq (str) – group for z-values
n_rows (int) – maximum number of rows to plot
figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches
- plot_clusters(ts: TSDataset, segment2cluster: Dict[str, int], centroids_df: Optional[pandas.core.frame.DataFrame] = None, columns_num: int = 2, figsize: Tuple[int, int] = (10, 5))[source]¶
Plot clusters [with centroids].
- Parameters
ts (TSDataset) – TSDataset with timeseries
segment2cluster (Dict[str, int]) – mapping from segment to cluster in format {segment: cluster}
centroids_df (Optional[pandas.core.frame.DataFrame]) – dataframe with centroids
columns_num (int) – number of columns in subplots
figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches
- plot_correlation_matrix(ts: TSDataset, columns: Optional[List[str]] = None, segments: Optional[List[str]] = None, method: str = 'pearson', mode: str = 'macro', columns_num: int = 2, figsize: Tuple[int, int] = (10, 10), **heatmap_kwargs)[source]¶
Plot pairwise correlation heatmap for selected segments.
- Parameters
ts (TSDataset) – TSDataset with timeseries data
columns (Optional[List[str]]) – Columns to use, if None use all columns
segments (Optional[List[str]]) – Segments to use
method (str) –
Method of correlation:
pearson: standard correlation coefficient
kendall: Kendall Tau correlation coefficient
spearman: Spearman rank correlation
mode ('macro' or 'per-segment') – Aggregation mode
columns_num (int) – Number of subplots columns
figsize (Tuple[int, int]) – size of the figure in inches
- plot_holidays(ts: TSDataset, holidays: Union[str, pandas.core.frame.DataFrame], segments: Optional[List[str]] = None, columns_num: int = 2, figsize: Tuple[int, int] = (10, 5), start: Optional[str] = None, end: Optional[str] = None, as_is: bool = False)[source]¶
Plot holidays for segments.
Sequence of timestamps with one holiday is drawn as a colored region. Individual holiday is drawn like a colored point.
It is not possible to distinguish points plotted at one timestamp, but this case is considered rare. This the problem isn’t relevant for region drawing because they are partially transparent.
- Parameters
ts (TSDataset) – TSDataset with timeseries data
holidays (Union[str, pandas.core.frame.DataFrame]) –
there are several options:
if str, then this is code of the country in holidays library;
if DataFrame, then dataframe is expected to be in prophet`s holiday format;
segments (Optional[List[str]]) – segments to use
columns_num (int) – number of columns in subplots
figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches
as_is (bool) –
- Use this option if DataFrame is represented as a dataframe with a timestamp index and holiday names columns.In a holiday column values 0 represent absence of holiday in that timestamp, 1 represent the presence.
start (Optional[str]) – start timestamp for plot
end (Optional[str]) – end timestamp for plot
- Raises
ValueError: –
Holiday nor pd.DataFrame or String. * Holiday is an empty pd.DataFrame. * as_is=True while holiday is String. * If upper_window is negative. * If lower_window is positive.
- plot_imputation(ts: TSDataset, imputer: TimeSeriesImputerTransform, segments: Optional[List[str]] = None, columns_num: int = 2, figsize: Tuple[int, int] = (10, 5), start: Optional[str] = None, end: Optional[str] = None)[source]¶
Plot the result of imputation by a given imputer.
- Parameters
ts (TSDataset) – TSDataset with timeseries data
imputer (TimeSeriesImputerTransform) – transform to make imputation of NaNs
segments (Optional[List[str]]) – segments to use
columns_num (int) – number of columns in subplots
figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches
start (Optional[str]) – start timestamp for plot
end (Optional[str]) – end timestamp for plot
- plot_periodogram(ts: TSDataset, period: float, amplitude_aggregation_mode: Union[str, Literal['per-segment']] = AggregationMode.mean, periodogram_params: Optional[Dict[str, Any]] = None, segments: Optional[List[str]] = None, xticks: Optional[List[Any]] = None, columns_num: int = 2, figsize: Tuple[int, int] = (10, 5))[source]¶
Plot the periodogram using
scipy.signal.periodogram()
.It is useful to determine the optimal
order
parameter forFourierTransform
.- Parameters
ts (TSDataset) – TSDataset with timeseries data
period (float) – the period of the seasonality to capture in frequency units of time series, it should be >= 2; it is translated to the
fs
parameter ofscipy.signal.periodogram()
amplitude_aggregation_mode (Union[str, Literal['per-segment']]) – aggregation strategy for obtained per segment periodograms; all the strategies can be examined at
AggregationMode
periodogram_params (Optional[Dict[str, Any]]) – additional keyword arguments for periodogram,
scipy.signal.periodogram()
is usedsegments (Optional[List[str]]) – segments to use
xticks (Optional[List[Any]]) – list of tick locations of the x-axis, useful to highlight specific reference periodicities
columns_num (int) – if
amplitude_aggregation_mode="per-segment"
number of columns in subplots, otherwise the value is ignoredfigsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches
- Raises
ValueError: – if period < 2
ValueError: – if periodogram can’t be calculated on segment because of the NaNs inside it
Notes
In non per-segment mode all segments are cut to be the same length, the last values are taken.