Which Amazon Forecast Algorithm should I use?

Every Amazon Forecast predictor uses an algorithm to train a model, then uses the model to make a forecast using an input dataset group. To help you get started, Amazon Forecast provides the following predefined algorithms:

  • Autoregressive Integrated Moving Average (ARIMA) Algorithm

    arn:aws:forecast:::algorithm/ARIMA

  • DeepAR+ Algorithm*

    arn:aws:forecast:::algorithm/Deep_AR_Plus

  • Exponential Smoothing (ETS) Algorithm

    arn:aws:forecast:::algorithm/ETS

  • Non-Parametric Time Series (NPTS) Algorithm

    arn:aws:forecast:::algorithm/NPTS

  • Prophet Algorithm

    arn:aws:forecast:::algorithm/Prophet

* Supports hyperparameter optimization (HPO)

Autoregressive Integrated Moving Average (ARIMA) Algorithm

Autoregressive Integrated Moving Average (ARIMA) is a commonly-used local statistical algorithm for time-series forecasting. ARIMA captures standard temporal structures (patterned organizations of time) in the input dataset. The Amazon Forecast ARIMA algorithm calls the Arima function in the Package 'forecast' of the Comprehensive R Archive Network (CRAN).

How ARIMA Works

The ARIMA algorithm is especially useful for datasets that can be mapped to stationary time series. The statistical properties of stationary time series, such as autocorrelations, are independent of time. Datasets with stationary time series usually contain a combination of signal and noise. The signal may exhibit a pattern of sinusoidal oscillation or have a seasonal component. ARIMA acts like a filter to separate the signal from the noise, and then extrapolates the signal in the future to make predictions.

ARIMA Hyperparameters and Tuning

For information about ARIMA hyperparameters and tuning, see the Arima function documentation in the Package ‘forecast’ of CRAN.

Amazon Forecast converts the DataFrequency parameter specified in the CreateDataset operation to the frequency parameter of the R ts function using the following table:

DataFrequency (string)R ts frequency (integer)
Y1
M12
W52
D7
H24
30min2
15min4
1min60

For frequencies less than 24 or short time series, the hyperparameters are set using the auto.arima function of the Package 'forecast' of CRAN. For frequencies greater than or equal to 24 and long time series, we use a Fourier series with K = 4, as described here, Forecasting with long seasonal periods.

Supported data frequencies that aren’t in the table default to a ts frequency of 1.

DeepAR+ Algorithm

Amazon Forecast DeepAR+ is a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNNs). Classical forecasting methods, such as autoregressive integrated moving average (ARIMA) or exponential smoothing (ETS), fit a single model to each individual time series, and then use that model to extrapolate the time series into the future. In many applications, however, you have many similar time series across a set of cross-sectional units. These time-series groupings demand different products, server loads, and requests for web pages. In this case, it can be beneficial to train a single model jointly over all of the time series. DeepAR+ takes this approach. When your dataset contains hundreds of feature time series, the DeepAR+ algorithm outperforms the standard ARIMA and ETS methods. You can also use the trained model for generating forecasts for new time series that are similar to the ones it has been trained on.

How DeepAR+ Works

During training, DeepAR+ uses a training dataset and an optional testing dataset. It uses the testing dataset to evaluate the trained model. In general, the training and testing datasets don’t have to contain the same set of time series. You can use a model trained on a given training set to generate forecasts for the future of the time series in the training set, and for other time series. Both the training and the testing datasets consist of (preferably more than one) target time series. Optionally, they can be associated with a vector of feature time series and a vector of categorical features (for details, see DeepAR Input/Output Interface in the Amazon SageMaker Developer Guide). The following example shows how this works for an element of a training dataset indexed by i. The training dataset consists of a target time series, zi,t, and two associated feature time series, xi,1,t and xi,2,t.


 Image: DeepAR+ time-series data.

The target time series might contain missing values (denoted in the graphs by breaks in the time series). DeepAR+ supports only feature time series that are known in the future. This allows you to run counterfactual “what-if” scenarios. For example, “What happens if I change the price of a product in some way?”

Each target time series can also be associated with a number of categorical features. You can use these to encode that a time series belongs to certain groupings. Using categorical features allows the model to learn typical behavior for those groupings, which can increase accuracy. A model implements this by learning an embedding vector for each group that captures the common properties of all time series in the group.

To facilitate learning time-dependent patterns, such as spikes during weekends, DeepAR+ automatically creates feature time series based on time-series granularity. For example, DeepAR+ creates two feature time series (day of the month and day of the year) at a weekly time-series frequency. It uses these derived feature time series along with the custom feature time series that you provide during training and inference. The following example shows two derived time-series features: ui,1,t represents the hour of the day, and ui,2,t the day of the week.


 Image: DeepAR+ two derived time-series.

DeepAR+ automatically includes these feature time series based on the data frequency and the size of training data. The following table lists the features that can be derived for each supported basic time frequency.

Frequency of the Time SeriesDerived Features
Minuteminute-of-hour, hour-of-day, day-of-week, day-of-month, day-of-year
Hourhour-of-day, day-of-week, day-of-month, day-of-year
Dayday-of-week, day-of-month, day-of-year
Weekday-of-month, week-of-year
Monthmonth-of-year

A DeepAR+ model is trained by randomly sampling several training examples from each of the time series in the training dataset. Each training example consists of a pair of adjacent context and prediction windows with fixed predefined lengths. The context_length hyperparameter controls how far in the past the network can see, and the prediction_length parameter controls how far in the future predictions can be made. During training, Amazon Forecast ignores elements in the training dataset with time series shorter than the specified prediction length. The following example shows five samples, with a context length (highlighted in green) of 12 hours and a prediction length (highlighted in blue) of 6 hours, drawn from element i. For the sake of brevity, we’ve excluded the feature time series xi,1,t and ui,2,t.


 Image: DeepAR+ sampled.

To capture seasonality patterns, DeepAR+ also automatically feeds lagged (past period) values from the target time series. In our example with samples taken at an hourly frequency, for each time index t = T, the model exposes the zi,t values which occurred approximately one, two, and three days in the past (highlighted in pink).


 Image: DeepAR+ lags.

For inference, the trained model takes as input the target time series, which might or might not have been used during training, and forecasts a probability distribution for the nextprediction_length values. Because DeepAR+ is trained on the entire dataset, the forecast takes into account learned patterns from similar time series.

For information on the mathematics behind DeepAR+, see DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks on the Cornell University Library website.

Exclusive Features of Amazon Forecast DeepAR+ over Amazon SageMaker DeepAR

The Amazon Forecast DeepAR+ algorithm improves upon the Amazon SageMaker DeepARalgorithm with the following new features:

  • Learning rate scheduling

    During a single training run, DeepAR+ can reduce its learning rate. This often reduces loss and forecasting error.

  • Model averaging

    When you use multiple models for training with the DeepAR+ algorithm, Amazon Forecast averages the training runs. This can reduce forecasting error and dramatically increase model stability. Your DeepAR+ model is more likely to provide robust results every time you train it.

  • Weighted sampling

    When you use a very large training dataset, DeepAR+ applies streaming sampling to ensure convergence despite the size of the training dataset. A DeepAR+ model can be trained with millions of time series in a matter of hours.

For information on how to use these features, see DeepAR+ Hyperparameters.

DeepAR+ Hyperparameters

The following table lists the hyperparameters that you can use in the DeepAR+ algorithm. Parameters in bold participate in hyperparameter optimization (HPO).

Parameter NameDescription
learning_rateThe learning rate used in training.

Valid values
Positive floating-point numbers
Typical values
0.0001 to 0.1
Default value
0.001
context_lengthThe number of time points that the model reads in before making the prediction. The value for this parameter should be about the same as the prediction_length. The model also receives lagged inputs from the target, so context_length can be much smaller than typical seasonalities. For example, a daily time series can have yearly seasonality. The model automatically includes a lag of one year, so the context length can be shorter than a year. The lag values that the model picks depend on the frequency of the time series. For example, lag values for daily frequency are: previous week, 2 weeks, 3 weeks, 4 weeks, and year.

Valid values
Positive integers
Typical values
ceil(0.1 * prediction_length) to min(200, 10 * prediction_length)
Default value
2 * prediction_length
prediction_lengthThe number of time-steps that the model is trained to predict, also called the forecast horizon. The trained model always generates forecasts with this length.

The prediction_length is set with the ForecastHorizon parameter of the CreatePredictor API. It cannot be changed without retraining the model.

Valid values
Positive integers
Typical values
N/A
Default value
N/A
num_layersThe number of hidden layers in the RNN.

Valid values
Positive integers
Typical values
1 to 4
Default value
2
num_cellsThe number of cells to use in each hidden layer of the RNN.

Valid values
Positive integers
Typical values
30 to 100
Default value
40
likelihoodThe model generates a probabilistic forecast, and can provide quantiles of the distribution and return samples. Depending on your data, choose an appropriate likelihood (noise model) that is used for uncertainty estimates.

Valid values

  • gaussian: Use for real-valued data.
  • beta: Use for real-valued targets between 0 and 1, inclusively.
  • student-T: Use this alternative for real-valued data for bursty data.
  • negative-binomial: Use for count data (non-negative integers).
  • deterministic-L1: A loss function that does not estimate uncertainty and only learns a point forecast.
Default value
student-T

Tune DeepAR+ Models

To tune Amazon Forecast DeepAR+ models, follow these recommendations for optimizing the training process and hardware configuration.

Best Practices for Process Optimization

To achieve the best results, follow these recommendations:

  • Except when splitting the training and testing datasets, always provide entire time series for training and testing, and when calling the model for inference. Regardless of how you set context_length, don’t divide the time series or provide only a part of it. The model will use data points further back than context_length for the lagged values feature.
  • For model tuning, you can split the dataset into training and testing datasets. In a typical evaluation scenario, you should test the model on the same time series used in training, but on the future prediction_length time points immediately after the last time point visible during training. To create training and testing datasets that satisfy these criteria, use the entire dataset (all of the time series) as a testing dataset and remove the last prediction_length points from each time series for training. This way, during training, the model doesn’t see the target values for time points on which it is evaluated during testing. In the test phase, the last prediction_length points of each time series in the testing dataset are withheld and a prediction is generated. The forecast is then compared with the actual values for the last prediction_length points. You can create more complex evaluations by repeating time series multiple times in the testing dataset, but cutting them off at different end points. This produces accuracy metrics that are averaged over multiple forecasts from different time points.
  • Avoid using very large values (> 400) for the prediction_length because this slows down the model and makes it less accurate. If you want to forecast further into the future, consider aggregating to a higher frequency. For example, use 5min instead of 1min.
  • Because of lags, the model can look further back than context_length. Therefore, you don’t have to set this parameter to a large value. A good starting point for this parameter is the same value as the prediction_length.
  • Train DeepAR+ models with as many time series as are available. Although a DeepAR+ model trained on a single time series might already work well, standard forecasting methods such as ARIMA or ETS might be more accurate and are more tailored to this use case. DeepAR+ starts to outperform the standard methods when your dataset contains hundreds of feature time series. Currently, DeepAR+ requires that the total number of observations available, across all training time series, is at least 300.

Exponential Smoothing (ETS) Algorithm

Exponential Smoothing (ETS) is a commonly-used local statistical algorithm for time-series forecasting. The Amazon Forecast ETS algorithm calls the ets function in the Package 'forecast' of the Comprehensive R Archive Network (CRAN).

How ETS Works

The ETS algorithm is especially useful for datasets with seasonality and other prior assumptions about the data. ETS computes a weighted average over all observations in the input time series dataset as its prediction. The weights are exponentially decreasing over time, rather than the constant weights in simple moving average methods. The weights are dependent on a constant parameter, which is known as the smoothing parameter.

ETS Hyperparameters and Tuning

For information about ETS hyperparameters and tuning, see the ets function documentation in the Package ‘forecast’ of CRAN.

Amazon Forecast converts the DataFrequency parameter specified in the CreateDataset operation to the frequency parameter of the R ts function using the following table:

DataFrequency (string)R ts frequency (integer)
Y1
M12
W52
D7
H24
30min2
15min4
1min60

Supported data frequencies that aren’t in the table default to a ts frequency of 1.

Non-Parametric Time Series (NPTS) Algorithm

The Amazon Forecast Non-Parametric Time Series (NPTS) algorithm is a scalable, probabilistic baseline forecaster. It predicts the future value distribution of a given time series by sampling from past observations. The predictions are bounded by the observed values. NPTS is especially useful when the time series is intermittent (or sparse, containing many 0s) and bursty. For example, forecasting demand for individual items where the time series has many low counts. Amazon Forecast provides variants of NPTS that differ in which of the past observations are sampled and how they are sampled. To use an NPTS variant, you choose a hyperparameter setting.

How NPTS Works

Similar to classical forecasting methods, such as exponential smoothing (ETS) and autoregressive integrated moving average (ARIMA), NPTS generates predictions for each time series individually. The time series in the dataset can have different lengths. The time points where the observations are available are called the training range and the time points where the prediction is desired are called the prediction range.

Amazon Forecast NPTS forecasters have the following variants: NPTS, seasonal NPTS, climatological forecaster, and seasonal climatological forecaster.

NPTS

In this variant, predictions are generated by sampling from all observations in the training range of the time series. However, instead of uniformly sampling from all of the observations, this variant assigns weight to each of the past observations according to how far it is from the current time step where the prediction is needed. In particular, it uses weights that decay exponentially according to the distance of the past observations. In this way, the observations from the recent past are sampled with much higher probability than the observations from the distant past. This assumes that the near past is more indicative for the future than the distant past. You can control the amount of decay in the weights with the exp_kernel_weights hyperparameter.

To use this NPTS variant in Amazon Forecast, set the use_seasonal_model hyperparameter to False and accept all other default settings.

Seasonal NPTS

The seasonal NPTS variant is similar to NPTS except that instead of sampling from all of the observations, it uses only the observations from the past seasons . By default, the season is determined by the granularity of the time series. For example, for an hourly time series, to predict for hour t, this variant samples from the observations corresponding to the hour t on the previous days. Similar to NPTS, observation at hour t on the previous day is given more weight than the observations at hour t on earlier days. For more information about how to determine seasonality based on the granularity of the time series, see Seasonal Features.

If you provide time series features with the feat_dynamic_real hyperparameter, seasonality is determined by both the granularity and the feat_dynamic_real hyperparameter. To use only thefeat_dynamic_real hyperparameter to define seasonality, set the use_default_time_featureshyperparameter to False. The feat_dynamic_real hyperparameter is turned on in Amazon Forecast by passing in the related time-series CSV file.

Climatological Forecaster

The climatological forecaster variant samples all of the past observations with uniform probability.

To use the climatological forecaster, set the kernel_type hyperparameter to uniform and the use_seasonal_model hyperparameter to False. Accept the default settings for all other hyperparameters.

Seasonal Climatological Forecaster

Similar to seasonal NPTS, the seasonal climatological forecaster samples the observations from past seasons, but samples them with uniform probability.

To use the seasonal climatological forecaster, set the kernel_type hyperparameter to uniform. Accept all other default settings for all of the other hyperparameters.

Seasonal Features

To determine what corresponds to a season for the seasonal NPTS and seasonal climatological forecaster, use the features listed in the following table. The table lists the derived features for the supported basic time frequencies, based on granularity. Amazon Forecast includes these feature time series, so you don’t have to provide them.

Frequency of the Time SeriesFeature to Determine Seasonality
Minuteminute-of-hour
Hourhour-of-day
Dayday-of-week
Weekday-of-month
Monthmonth-of-year

When using the Amazon Forecast NPTS algorithms, consider the following best practices for preparing the data and achieving optimal results:

  • Because NPTS generates predictions for each time series individually, provide the entire time series when calling the model for prediction. Also, accept the default value of thecontext_length hyperparameter. This causes the algorithm to use the entire time series. If you change the context_length (because the training data is too long), make sure it is large enough and covers multiple past seasons. For example, for a daily time series, this value must be at least 365 days (provided that you have that amount of data).
  • If the data has seasonality patterns, the seasonal NPTS algorithm typically works better. If external events, such as special holidays and promotions, have an effect on the time series, then provide those features in the feat_dynamic_real hyperparameter and use seasonal NPTS. In this case, you must also provide the feat_dynamic_real hyperparameter for both training and prediction ranges by providing the related time-series CSV file to Amazon Forecast.

NPTS Hyperparameters

The following table lists the hyperparameters that you can use in the NPTS algorithm.

Parameter NameDescription
prediction_lengthThe number of time-steps that the model is trained to predict. This is also called the forecast horizon. The trained model always generates a forecast with this length.

The prediction_length is set from the ForecastHorizonparameter of the CreatePredictor API. It cannot be changed without retraining the model.

Valid values:
Positive integers
Default
N/A
context_lengthThe number of time-points in the past that the model uses for making the prediction. By default, it uses all of the time points in the training range. Typically, the value for this hyperparameter should be large and should cover multiple past seasons. For example, for the daily time series this value must be at least 365 days.

Valid values
Positive integers
Default value
The length of the training time series
kernel_typeThe kernel to use to define the weights used for sampling past observations.

Valid values
exponential or uniform
Default values
exponential
exp_kernel_weightsValid only when kernel_type is exponential.

The scaling parameter of the kernel. For faster (exponential) decay in the weights given to the observations in the distant past, use a large value.

Valid values
Positive floating-point numbers
Default value
0.01
use_seasonal_modelWhether to use a seasonal variant.

Valid values
True or False
Default value
True
use_default_time_featuresValid only for the seasonal NPTS and seasonal climatological forecaster variants.

Whether to use seasonal features based on the granularity of the time series to determine seasonality.

Valid values
True or False
Default value
True

Prophet Algorithm

Prophet is a popular local Bayesian structural time series model. The Amazon Forecast Prophet algorithm uses the Prophet class of the Python implementation of Prophet.

How Prophet Works

Prophet is especially useful for datasets that:

  • Contain an extended time period (months or years) of detailed historical observations (hourly, daily, or weekly)
  • Have multiple strong seasonalities
  • Include previously known important, but irregular, events
  • Have missing data points or large outliers
  • Have non-linear growth trends that are approaching a limit

Prophet is an additive regression model with a piecewise linear or logistic growth curve trend. It includes a yearly seasonal component modeled using Fourier series and a weekly seasonal component modeled using dummy variables.

For more information, see Prophet: forecasting at scale.

Prophet Hyperparameters and Related Time Series

Amazon Forecast uses the default Prophet hyperparameters. Prophet also supports related time-series as features, provided to Amazon Forecast in the related time-series CSV file.