Time series use cases

Analysis

General analysis

Since timeseries usecase are regressions, you’ll find the same level of analytics than for its tabular counterpart.

Time gauge

We recall the selection criteria entered by the user on the time gauge:

_images/analysis_data_ts_head.png

Feature importance

The goal of the timeserie modelisation is to find automatically new temporal features that will increase the predictive power of the model. Temporel features will be created based on statistical signifiance such as autocorrelation function (ACF), partial autocorrelation function (PACF), correlation with the TARGET, …

Created features can be found in the feature importance:

_images/analysis_data_ts_fi.png

They are constructed with the name of the original feature, followed by some moving agregate functions:

  • featurename_lag_X = lag (offset) of X timestep of featurename
  • featurename_min_a_b = minimum of featurename between a and b timestep
  • featurename_max_a_b = maximum of featurename between a and b timestep
  • featurename_mean_a_b = mean (moving average) of featurename between a and b timestep
  • featurename_bollinger_upper_a_b = upper bound of bolliger (~ moving average + sd) of featurename between a and b timestep
  • featurename_bollinger_lower_a_b = lower bound of bollinger (~ moving average - sd) of featurename between a and b timestep

Please keep in mind that featurename can be the TARGET or any feature present in the dataset.

Predictions

When forecasting, it is necessary to send a historical dataset of at least the same length as the interval between the 2 boundaries of the historical window. This set will be completely filled with the actual data (including the target) and will be completed with the data to be forecasted:

  • The target that will be absent -> Prevision.io will detect the period to be predicted from the moment the target ceases to be known
  • The data will be filled in a priori
  • Non a priori data will be missing

The output of this step will be a file (time, value) filled over the forecasted period. In addition, if the historical period is longer than the length of the window, forecasts will be made using this data and will allow a test score to be calculated directly in the application.

In case of problems

During training

Given the complexity of time series modeling, it is essential that the data set respects the following constraints during the learning phase:

  • Check that the target is numeric
  • Check the constraints on the temporal window and the history window
  • Check that a time column is filled in in ISO 8601 format (or in classic formats, such as DD/MM/YYYYY or DD-MM-YYYY hh:mm for example)
  • Check that the time spacing is consistent for at least 80% of the data (e.g.: You send a series of one day at the hourly step. If more than 5 data are missing, the calculation will not be successful)
  • Check, when there is a group, that the columns designated as such identify a unique time series (i.e. a maximum value on a timestamp)
  • Check, when there is a group, that the time step is consistent between the groups
  • Check that the time steps and the number of missing data respect the rules mentioned above, including all intersections induced by the possible presence of groups

Remarks:

  • Evaluation is performed on a time split cross validation
  • In case of multiple lines on the same timestamp, only the first event is kept
  • In case of missing timestamps, the last known value is propagated to the next known timestamp
  • Each group must contain at least 3 observations. If this is not the case, the group will be deleted from the dataset

During forecast

I have a file containing 0 forecasts

Make sure you have provided a dataset with a missing target starting from a given timestamp. If the target column is still filled, we cannot extend the forecast, especially if your use case contains a priori groups and features.

The prediction returns inconsistent results

Check that the a priori features in particular are correctly filled in for the values to be forecasted.

Check that all the labelled data corresponding to the history window is filled in. Missing data will be imputed as equal to the mean of the target, which can screw results.

Check that the difference between the time of training and the prediction is not too high. Time series may require more frequent re-training than other use cases because of natural target drift.

The prediction returns an error

In general, check that you provide a sufficient history consistent with the definition of your use case.

If your dataset contains groups:

  • Check that the groups are temporally consistent, i.e. for each group there are as many time steps as the others
  • Check that no new groups appear at the time of the forecast