hist_outliers

Functions

adjust_estimation(i, k, sse, sse_one_bin)

Count sse_one_bin[i][k] using binary search.

compute_f(series, k, p, pp)

Compute F.

get_anomalies_hist(ts[, in_column, bins_number])

Get point outliers in time series using histogram model.

hist(series, bins_number)

Compute outliers indices according to hist rule.

optimal_sse(left, right, p, pp)

Count the approximation error by 1 bin from left to right elements.

v_optimal_hist(series, bins_number, p, pp)

Count an approximation error of a series with [1, bins_number] bins.

adjust_estimation(i: int, k: int, sse: numpy.ndarray, sse_one_bin: numpy.ndarray) float[source]

Count sse_one_bin[i][k] using binary search.

Parameters
  • i (int) – left border of series

  • k (int) – number of bins

  • sse (numpy.ndarray) – array of approximation errors

  • sse_one_bin (numpy.ndarray) – array of approximation errors with one bin

Returns

result – calculated sse_one_bin[i][k]

Return type

float

compute_f(series: numpy.ndarray, k: int, p: numpy.ndarray, pp: numpy.ndarray) Tuple[numpy.ndarray, list][source]

Compute F. F[a][b][k] - minimum approximation error on series[a:b+1] with k outliers.

Reference.

Parameters
  • series (numpy.ndarray) – array to count F

  • k (int) – number of outliers

  • p (numpy.ndarray) – array of sums of elements, p[i] - sum from 0th to i elements

  • pp (numpy.ndarray) – array of sums of squares of elements, pp[i] - sum of squares from 0th to i elements

Returns

result – array F, outliers_indices

Return type

np.ndarray

get_anomalies_hist(ts: TSDataset, in_column: str = 'target', bins_number: int = 10) Dict[str, List[pandas._libs.tslibs.timestamps.Timestamp]][source]

Get point outliers in time series using histogram model.

Outliers are all points that, when removed, result in a histogram with a lower approximation error, even with the number of bins less than the number of outliers.

Parameters
  • ts (TSDataset) – TSDataset with timeseries data

  • in_column (str) – name of the column in which the anomaly is searching

  • bins_number (int) – number of bins

Returns

dict of outliers in format {segment: [outliers_timestamps]}

Return type

Dict[str, List[pandas._libs.tslibs.timestamps.Timestamp]]

hist(series: numpy.ndarray, bins_number: int) numpy.ndarray[source]

Compute outliers indices according to hist rule.

Reference.

Parameters
  • series (numpy.ndarray) – array to count F

  • bins_number (int) – number of bins

Returns

indices – outliers indices

Return type

np.ndarray

optimal_sse(left: int, right: int, p: numpy.ndarray, pp: numpy.ndarray) float[source]

Count the approximation error by 1 bin from left to right elements.

Parameters
  • left (int) – left border

  • right (int) – right border

  • p (numpy.ndarray) – array of sums of elements, p[i] - sum from first to i elements

  • pp (numpy.ndarray) – array of sums of squares of elements, pp[i] - sum of squares from first to i elements

Returns

result – approximation error

Return type

float

v_optimal_hist(series: numpy.ndarray, bins_number: int, p: numpy.ndarray, pp: numpy.ndarray) numpy.ndarray[source]

Count an approximation error of a series with [1, bins_number] bins.

Reference.

Parameters
  • series (numpy.ndarray) – array to count an approximation error with bins_number bins

  • bins_number (int) – number of bins

  • p (numpy.ndarray) – array of sums of elements, p[i] - sum from 0th to i elements

  • pp (numpy.ndarray) – array of sums of squares of elements, p[i] - sum of squares from 0th to i elements

Returns

error – approximation error of a series with [1, bins_number] bins

Return type

np.ndarray