Every Amazon Forecast predictor uses an algorithm to train a model, then uses the model to make a forecast using an input dataset group. To help you get started, Amazon Forecast provides the following predefined algorithms:
 Autoregressive Integrated Moving Average (ARIMA) Algorithm
arn:aws:forecast:::algorithm/ARIMA
 DeepAR+ Algorithm*
arn:aws:forecast:::algorithm/Deep_AR_Plus
 Exponential Smoothing (ETS) Algorithm
arn:aws:forecast:::algorithm/ETS
 NonParametric Time Series (NPTS) Algorithm
arn:aws:forecast:::algorithm/NPTS
 Prophet Algorithm
arn:aws:forecast:::algorithm/Prophet
* Supports hyperparameter optimization (HPO)
Autoregressive Integrated Moving Average (ARIMA) Algorithm
Autoregressive Integrated Moving Average (ARIMA) is a commonlyused local statistical algorithm for timeseries forecasting. ARIMA captures standard temporal structures (patterned organizations of time) in the input dataset. The Amazon Forecast ARIMA algorithm calls the Arima function in the Package 'forecast'
of the Comprehensive R Archive Network (CRAN).
How ARIMA Works
The ARIMA algorithm is especially useful for datasets that can be mapped to stationary time series. The statistical properties of stationary time series, such as autocorrelations, are independent of time. Datasets with stationary time series usually contain a combination of signal and noise. The signal may exhibit a pattern of sinusoidal oscillation or have a seasonal component. ARIMA acts like a filter to separate the signal from the noise, and then extrapolates the signal in the future to make predictions.
ARIMA Hyperparameters and Tuning
For information about ARIMA hyperparameters and tuning, see the Arima
function documentation in the Package ‘forecast’ of CRAN.
Amazon Forecast converts the DataFrequency
parameter specified in the CreateDataset operation to the frequency
parameter of the R ts function using the following table:
DataFrequency (string)  R ts frequency (integer) 

Y  1 
M  12 
W  52 
D  7 
H  24 
30min  2 
15min  4 
1min  60 
For frequencies less than 24 or short time series, the hyperparameters are set using the auto.arima
function of the Package 'forecast'
of CRAN. For frequencies greater than or equal to 24 and long time series, we use a Fourier series with K = 4, as described here, Forecasting with long seasonal periods.
Supported data frequencies that aren’t in the table default to a ts
frequency of 1.
DeepAR+ Algorithm
Amazon Forecast DeepAR+ is a supervised learning algorithm for forecasting scalar (onedimensional) time series using recurrent neural networks (RNNs). Classical forecasting methods, such as autoregressive integrated moving average (ARIMA) or exponential smoothing (ETS), fit a single model to each individual time series, and then use that model to extrapolate the time series into the future. In many applications, however, you have many similar time series across a set of crosssectional units. These timeseries groupings demand different products, server loads, and requests for web pages. In this case, it can be beneficial to train a single model jointly over all of the time series. DeepAR+ takes this approach. When your dataset contains hundreds of feature time series, the DeepAR+ algorithm outperforms the standard ARIMA and ETS methods. You can also use the trained model for generating forecasts for new time series that are similar to the ones it has been trained on.
How DeepAR+ Works
During training, DeepAR+ uses a training dataset and an optional testing dataset. It uses the testing dataset to evaluate the trained model. In general, the training and testing datasets don’t have to contain the same set of time series. You can use a model trained on a given training set to generate forecasts for the future of the time series in the training set, and for other time series. Both the training and the testing datasets consist of (preferably more than one) target time series. Optionally, they can be associated with a vector of feature time series and a vector of categorical features (for details, see DeepAR Input/Output Interface in the Amazon SageMaker Developer Guide). The following example shows how this works for an element of a training dataset indexed by i
. The training dataset consists of a target time series, z
_{i,t}, and two associated feature time series, x
_{i,1,t} and x
_{i,2,t}.
The target time series might contain missing values (denoted in the graphs by breaks in the time series). DeepAR+ supports only feature time series that are known in the future. This allows you to run counterfactual “whatif” scenarios. For example, “What happens if I change the price of a product in some way?”
Each target time series can also be associated with a number of categorical features. You can use these to encode that a time series belongs to certain groupings. Using categorical features allows the model to learn typical behavior for those groupings, which can increase accuracy. A model implements this by learning an embedding vector for each group that captures the common properties of all time series in the group.
To facilitate learning timedependent patterns, such as spikes during weekends, DeepAR+ automatically creates feature time series based on timeseries granularity. For example, DeepAR+ creates two feature time series (day of the month and day of the year) at a weekly timeseries frequency. It uses these derived feature time series along with the custom feature time series that you provide during training and inference. The following example shows two derived timeseries features: u
_{i,1,t} represents the hour of the day, and u
_{i,2,t} the day of the week.
DeepAR+ automatically includes these feature time series based on the data frequency and the size of training data. The following table lists the features that can be derived for each supported basic time frequency.
Frequency of the Time Series  Derived Features 

Minute  minuteofhour, hourofday, dayofweek, dayofmonth, dayofyear 
Hour  hourofday, dayofweek, dayofmonth, dayofyear 
Day  dayofweek, dayofmonth, dayofyear 
Week  dayofmonth, weekofyear 
Month  monthofyear 
A DeepAR+ model is trained by randomly sampling several training examples from each of the time series in the training dataset. Each training example consists of a pair of adjacent context and prediction windows with fixed predefined lengths. The context_length
hyperparameter controls how far in the past the network can see, and the prediction_length
parameter controls how far in the future predictions can be made. During training, Amazon Forecast ignores elements in the training dataset with time series shorter than the specified prediction length. The following example shows five samples, with a context length (highlighted in green) of 12 hours and a prediction length (highlighted in blue) of 6 hours, drawn from element i
. For the sake of brevity, we’ve excluded the feature time series x
_{i,1,t} and u
_{i,2,t}.
To capture seasonality patterns, DeepAR+ also automatically feeds lagged (past period) values from the target time series. In our example with samples taken at an hourly frequency, for each time index t = T
, the model exposes the z
_{i,t} values which occurred approximately one, two, and three days in the past (highlighted in pink).
For inference, the trained model takes as input the target time series, which might or might not have been used during training, and forecasts a probability distribution for the nextprediction_length
values. Because DeepAR+ is trained on the entire dataset, the forecast takes into account learned patterns from similar time series.
For information on the mathematics behind DeepAR+, see DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks on the Cornell University Library website.
Exclusive Features of Amazon Forecast DeepAR+ over Amazon SageMaker DeepAR
The Amazon Forecast DeepAR+ algorithm improves upon the Amazon SageMaker DeepARalgorithm with the following new features:
 Learning rate scheduling
During a single training run, DeepAR+ can reduce its learning rate. This often reduces loss and forecasting error.
 Model averaging
When you use multiple models for training with the DeepAR+ algorithm, Amazon Forecast averages the training runs. This can reduce forecasting error and dramatically increase model stability. Your DeepAR+ model is more likely to provide robust results every time you train it.
 Weighted sampling
When you use a very large training dataset, DeepAR+ applies streaming sampling to ensure convergence despite the size of the training dataset. A DeepAR+ model can be trained with millions of time series in a matter of hours.
For information on how to use these features, see DeepAR+ Hyperparameters.
DeepAR+ Hyperparameters
The following table lists the hyperparameters that you can use in the DeepAR+ algorithm. Parameters in bold participate in hyperparameter optimization (HPO).
Parameter Name  Description 

learning_rate  The learning rate used in training.

context_length  The number of time points that the model reads in before making the prediction. The value for this parameter should be about the same as the prediction_length . The model also receives lagged inputs from the target, so context_length can be much smaller than typical seasonalities. For example, a daily time series can have yearly seasonality. The model automatically includes a lag of one year, so the context length can be shorter than a year. The lag values that the model picks depend on the frequency of the time series. For example, lag values for daily frequency are: previous week, 2 weeks, 3 weeks, 4 weeks, and year.

prediction_length  The number of timesteps that the model is trained to predict, also called the forecast horizon. The trained model always generates forecasts with this length. The

num_layers  The number of hidden layers in the RNN.

num_cells  The number of cells to use in each hidden layer of the RNN.

likelihood  The model generates a probabilistic forecast, and can provide quantiles of the distribution and return samples. Depending on your data, choose an appropriate likelihood (noise model) that is used for uncertainty estimates. Valid values

Tune DeepAR+ Models
To tune Amazon Forecast DeepAR+ models, follow these recommendations for optimizing the training process and hardware configuration.
Best Practices for Process Optimization
To achieve the best results, follow these recommendations:
 Except when splitting the training and testing datasets, always provide entire time series for training and testing, and when calling the model for inference. Regardless of how you set
context_length
, don’t divide the time series or provide only a part of it. The model will use data points further back thancontext_length
for the lagged values feature.  For model tuning, you can split the dataset into training and testing datasets. In a typical evaluation scenario, you should test the model on the same time series used in training, but on the future
prediction_length
time points immediately after the last time point visible during training. To create training and testing datasets that satisfy these criteria, use the entire dataset (all of the time series) as a testing dataset and remove the lastprediction_length
points from each time series for training. This way, during training, the model doesn’t see the target values for time points on which it is evaluated during testing. In the test phase, the lastprediction_length
points of each time series in the testing dataset are withheld and a prediction is generated. The forecast is then compared with the actual values for the lastprediction_length
points. You can create more complex evaluations by repeating time series multiple times in the testing dataset, but cutting them off at different end points. This produces accuracy metrics that are averaged over multiple forecasts from different time points.  Avoid using very large values (> 400) for the
prediction_length
because this slows down the model and makes it less accurate. If you want to forecast further into the future, consider aggregating to a higher frequency. For example, use5min
instead of1min
.  Because of lags, the model can look further back than
context_length
. Therefore, you don’t have to set this parameter to a large value. A good starting point for this parameter is the same value as theprediction_length
.  Train DeepAR+ models with as many time series as are available. Although a DeepAR+ model trained on a single time series might already work well, standard forecasting methods such as ARIMA or ETS might be more accurate and are more tailored to this use case. DeepAR+ starts to outperform the standard methods when your dataset contains hundreds of feature time series. Currently, DeepAR+ requires that the total number of observations available, across all training time series, is at least 300.
Exponential Smoothing (ETS) Algorithm
Exponential Smoothing (ETS) is a commonlyused local statistical algorithm for timeseries forecasting. The Amazon Forecast ETS algorithm calls the ets function in the Package 'forecast'
of the Comprehensive R Archive Network (CRAN).
How ETS Works
The ETS algorithm is especially useful for datasets with seasonality and other prior assumptions about the data. ETS computes a weighted average over all observations in the input time series dataset as its prediction. The weights are exponentially decreasing over time, rather than the constant weights in simple moving average methods. The weights are dependent on a constant parameter, which is known as the smoothing parameter.
ETS Hyperparameters and Tuning
For information about ETS hyperparameters and tuning, see the ets
function documentation in the Package ‘forecast’ of CRAN.
Amazon Forecast converts the DataFrequency
parameter specified in the CreateDataset operation to the frequency
parameter of the R ts function using the following table:
DataFrequency (string)  R ts frequency (integer) 

Y  1 
M  12 
W  52 
D  7 
H  24 
30min  2 
15min  4 
1min  60 
Supported data frequencies that aren’t in the table default to a ts
frequency of 1.
NonParametric Time Series (NPTS) Algorithm
The Amazon Forecast NonParametric Time Series (NPTS) algorithm is a scalable, probabilistic baseline forecaster. It predicts the future value distribution of a given time series by sampling from past observations. The predictions are bounded by the observed values. NPTS is especially useful when the time series is intermittent (or sparse, containing many 0s) and bursty. For example, forecasting demand for individual items where the time series has many low counts. Amazon Forecast provides variants of NPTS that differ in which of the past observations are sampled and how they are sampled. To use an NPTS variant, you choose a hyperparameter setting.
How NPTS Works
Similar to classical forecasting methods, such as exponential smoothing (ETS) and autoregressive integrated moving average (ARIMA), NPTS generates predictions for each time series individually. The time series in the dataset can have different lengths. The time points where the observations are available are called the training range and the time points where the prediction is desired are called the prediction range.
Amazon Forecast NPTS forecasters have the following variants: NPTS, seasonal NPTS, climatological forecaster, and seasonal climatological forecaster.
Topics
NPTS
In this variant, predictions are generated by sampling from all observations in the training range of the time series. However, instead of uniformly sampling from all of the observations, this variant assigns weight to each of the past observations according to how far it is from the current time step where the prediction is needed. In particular, it uses weights that decay exponentially according to the distance of the past observations. In this way, the observations from the recent past are sampled with much higher probability than the observations from the distant past. This assumes that the near past is more indicative for the future than the distant past. You can control the amount of decay in the weights with the exp_kernel_weights
hyperparameter.
To use this NPTS variant in Amazon Forecast, set the use_seasonal_model
hyperparameter to False
and accept all other default settings.
Seasonal NPTS
The seasonal NPTS variant is similar to NPTS except that instead of sampling from all of the observations, it uses only the observations from the past seasons . By default, the season is determined by the granularity of the time series. For example, for an hourly time series, to predict for hour t, this variant samples from the observations corresponding to the hour t on the previous days. Similar to NPTS, observation at hour t on the previous day is given more weight than the observations at hour t on earlier days. For more information about how to determine seasonality based on the granularity of the time series, see Seasonal Features.
If you provide time series features with the feat_dynamic_real
hyperparameter, seasonality is determined by both the granularity and the feat_dynamic_real
hyperparameter. To use only thefeat_dynamic_real
hyperparameter to define seasonality, set the use_default_time_features
hyperparameter to False
. The feat_dynamic_real
hyperparameter is turned on in Amazon Forecast by passing in the related timeseries CSV file.
Climatological Forecaster
The climatological forecaster variant samples all of the past observations with uniform probability.
To use the climatological forecaster, set the kernel_type
hyperparameter to uniform
and the use_seasonal_model
hyperparameter to False
. Accept the default settings for all other hyperparameters.
Seasonal Climatological Forecaster
Similar to seasonal NPTS, the seasonal climatological forecaster samples the observations from past seasons, but samples them with uniform probability.
To use the seasonal climatological forecaster, set the kernel_type
hyperparameter to uniform
. Accept all other default settings for all of the other hyperparameters.
Seasonal Features
To determine what corresponds to a season for the seasonal NPTS and seasonal climatological forecaster, use the features listed in the following table. The table lists the derived features for the supported basic time frequencies, based on granularity. Amazon Forecast includes these feature time series, so you don’t have to provide them.
Frequency of the Time Series  Feature to Determine Seasonality 

Minute  minuteofhour 
Hour  hourofday 
Day  dayofweek 
Week  dayofmonth 
Month  monthofyear 
Best Practices
When using the Amazon Forecast NPTS algorithms, consider the following best practices for preparing the data and achieving optimal results:
 Because NPTS generates predictions for each time series individually, provide the entire time series when calling the model for prediction. Also, accept the default value of the
context_length
hyperparameter. This causes the algorithm to use the entire time series. If you change thecontext_length
(because the training data is too long), make sure it is large enough and covers multiple past seasons. For example, for a daily time series, this value must be at least 365 days (provided that you have that amount of data).  If the data has seasonality patterns, the seasonal NPTS algorithm typically works better. If external events, such as special holidays and promotions, have an effect on the time series, then provide those features in the
feat_dynamic_real
hyperparameter and use seasonal NPTS. In this case, you must also provide thefeat_dynamic_real
hyperparameter for both training and prediction ranges by providing the related timeseries CSV file to Amazon Forecast.
NPTS Hyperparameters
The following table lists the hyperparameters that you can use in the NPTS algorithm.
Parameter Name  Description 

prediction_length  The number of timesteps that the model is trained to predict. This is also called the forecast horizon. The trained model always generates a forecast with this length. The

context_length  The number of timepoints in the past that the model uses for making the prediction. By default, it uses all of the time points in the training range. Typically, the value for this hyperparameter should be large and should cover multiple past seasons. For example, for the daily time series this value must be at least 365 days.

kernel_type  The kernel to use to define the weights used for sampling past observations.

exp_kernel_weights  Valid only when kernel_type is exponential .The scaling parameter of the kernel. For faster (exponential) decay in the weights given to the observations in the distant past, use a large value.

use_seasonal_model  Whether to use a seasonal variant.

use_default_time_features  Valid only for the seasonal NPTS and seasonal climatological forecaster variants. Whether to use seasonal features based on the granularity of the time series to determine seasonality.

Prophet Algorithm
Prophet is a popular local Bayesian structural time series model. The Amazon Forecast Prophet algorithm uses the Prophet class of the Python implementation of Prophet.
How Prophet Works
Prophet is especially useful for datasets that:
 Contain an extended time period (months or years) of detailed historical observations (hourly, daily, or weekly)
 Have multiple strong seasonalities
 Include previously known important, but irregular, events
 Have missing data points or large outliers
 Have nonlinear growth trends that are approaching a limit
Prophet is an additive regression model with a piecewise linear or logistic growth curve trend. It includes a yearly seasonal component modeled using Fourier series and a weekly seasonal component modeled using dummy variables.
For more information, see Prophet: forecasting at scale.
Prophet Hyperparameters and Related Time Series
Amazon Forecast uses the default Prophet hyperparameters. Prophet also supports related timeseries as features, provided to Amazon Forecast in the related timeseries CSV file.