Time Series Analysis & Forecasting

10 min readNov 9, 2021

There are two types of forecasting processes:

Descriptive : Evaluating the effect of some event , such as impact of Corona on sales. The goal is retrospective in nature and is therefore descriptive or even explanatory. We call the process as ‘Time-Series analysis’
Predictive : Forecasting data such as predicting future monthly sales to do revenue management is predictive goal. We call the process as ‘Time- Series Forecasting’

Forecasting Horizon: How far in future should the forecast be done? For next few months or weeks?

Forecasts can be numerical or binary (eg: will it rain or not).

Time Series is a combination of four parts: Level (average value in the series), Grained (increasing or decreasing values), Seasonality (repeating short term cycle) & noise level (random variations in the series).

Time Series data must be transformed to be modelled as Supervised Learning problem. We create new features from the time series dataset, such as calculating weekday, seasonality etc. from dates. This process is called feature engineering. Below are different type of features:

Date Time Features: It can be feature engineered to create columns like ‘Weekday/Weekend’, ‘Seasonality’ etc.
Lag Features : This is the value at prior time period, eg: if considering previous date data
Window Features : Summary of value over a fixed window of prior time steps. Eg: Averaging data removes seasonality & noise, hence it is helpful. 1) Rolling Features : It adds summary of value at previous time steps ….. 2) Expanding Window : Includes all previous data in series.

Changing frequency of available data to match the frequency of required forecast is called Resampling. There are 2 types of it:

UpSampling : You increase the frequency of the samples by adding sample to the data. Eg: Minutes to Seconds, Quarters to months etc. This is achieved using interpolation technique.
DownSampling : Decrease the frequency of samples by decreasing the sample data. Eg: days to months. This is achieved using summary statistics.

Most basic plot for visualizing time series is time plot. Below are the following steps of digging deeper into visualization:

Zooming In: Looking at shorter period of time within the series instead of larger period of time. It can reveal hidden patters.
Adding Trend Lines: To capture the type of trend (eg: linear, exponential, cubic etc.) in data, we can add trend lines.
Suppressing Seasonality : Remove effects of seasonal patterns in data by plotting series for separate seasons instead of all seasons together, or by plotting series at cruder time scale (eg: year instead of month) and last option is to use moving average plots.
Lag Scatter Plot: Previous observations in Time Series is called Lags. Time Series Modelling assumes relationship between observation and previous observation. Plot to explore that relationship is called Lag Scatter Plot.

Most of the times we forecast time series using linear data but it doesn’t perform that well with non-linear(eg: exponential growth graph) data. So, we do power transformation while pre-processing the data. Changing the scale of the series so that it becomes a linear relationship is called power transformation. Eg: taking logarithmic value to convert exponentially rising graph to linear graph. This is not mandatory step but few models perform better with this transformation.

Moving average smoothing is creating new series where values are averages of raw observations. This process helps in removing noise and finding underlying trend. There are two main parameters for this process: window width (eg: last 3 or last 4 etc.) and position of window (eg: averaging last 3 observations to get the third observation values). There are two types of MAs:

Trailing Moving Average : When moving averages are placed at the end of range. It can be used in two ways: Feature Engineering (use average value as new feature) & Forecasting next period value
Centered Moving Average : When you center the moving averages, they are placed at the center of the range rather than the end of it.

MA assumes that there is no trend or seasonality in data and hence, it doesn't give very good accuracy for forecasted time series. But because of its a simple way to forecast values, its often used to get rough estimation of prediction.

Simple Average Smoothing is when we take weighted average in contrast to simple average for Moving Average Smoothing. Importance of latest values can be more than old values , hence weights are assigned accordingly. Similar to MA, this method is also used when there is no trend or seasonality in data. We can change the value (recent or old) of alpha(weight) based on which value we want to give importance to.

White Noise: If the series that has to be forecasted has white noise then we can not forecast it. White noise means random numbers. So, we first try to find if there is any white noise in the series or not. If there is no white noise, then we try finding white noise in the error (difference between forecasted and actual). If there is no white noise in the error that means there were more information in the data that our current model was unable to find else it means all the information has been harnessed and what is left is just random numbers.

So, series should not have white noise while error values should have white noise.

Random Walk : When series is random walk, we use previous value as forecast for next values. This is called naïve forecasting (persistence model). This can be used to compare the output with advance models output to decide which model to proceed with.

With White noise, we cannot forecast at all but with random walk, naïve forecasting is the best bet.

Decomposing Time Series

Additive Model

y(t)= Level+Trend+Seasonality+Noise

Additive Decomposition with constant seasonality

Multiplicative Model

y(t) = Level*Trend*Seasonality*Noise

y(t) is the time series.

Multiplicative Decompose as seasonality is increasing

The additive model is useful when the seasonal variation is relatively constant over time.
The multiplicative model is useful when the seasonal variation increases over time.

Use seasonal decompose from statsmodel library to identify seasonality and trends in dataset.

Differencing

Differencing can help stabilize the mean of the time series by removing changes in the level of a time series, and so eliminating (or reducing) trend and seasonality.

Lag 1 Differencing

y(t) — y(t-1)

Lag k Differencing

y(t) — y(t-k)

To remove monthly pattern in data, eg: seasonal change in months then we use lag(12) differencing, if its weekly seasonality (eg, sales increase on weekends vs weekdays), we use lag(7).

Train-Test Split in Time Series

Unlike other machine learning algorithms, we don’t randomly choose train and test dataset because data is organized in particular order using time

For other machine learning algorithms, we define train-validation-test datasets for training, hyperparameter tuning and testing data but in time series we just divide in train and test because it makes more sense to train data using validation data than validating the model, secondly we also have limited datasets available most of the time.

Walk-Forward Validation : In time series modelling, the predictions over time become less and less accurate and hence it is a more realistic approach to re-train the model with actual data as it gets available for further predictions.

Auto-Regression Model

Apply this method after removing trend and seasonality. Auto-Regression is a regression model, prediction is based on input variable. It’s used for forecasting when there is some correlation between values in a time series and the values that precede and succeed them.

Because the regression model uses data from the same input variable at previous time steps, it is referred to as an auto-regression (regression of self).

X(t+1) = b0 + b1*X(t-1) + b2*X(t-2)

The process is basically a linear regression of the data in the current series against one or more past values in the same series.

You can use walk-forward validation method in AR model to predict future data, this will help in increasing accuracy of model

Moving Average Forecasting

It is different from moving average smoothing method.

A moving average model is used for forecasting future values, while moving average smoothing is used for estimating the trend-cycle of past values.

Moving Average (MA only) model is one where Y(t) depends only on the lagged forecast errors, where the error terms are the errors of the autoregressive models of the respective lags. So, we basically run AR on residual values of time-series

We build forecasting model using AR or Persistence model. Then, we find the residuals (forecast error). We again build a forecast model on the residuals to capture any trend that we would have missed initially and make sure residuals have only white noise and no patterns. We generally use AR to run forecast on residual values. Finally we update our initial forecast with the the forecasted residuals.

If initial technique is AR then this whole process is called ARIMA.

An ARIMA model is one where the time series was differenced at least once to make it stationary and you combine the AR and the MA terms.

While doing auto-regression, the biggest challenge is to know how many lag values of historical data are we going to use for future predictions, this is where auto-correlation and partial auto-correlation comes into picture.

We can split the Arima term into three terms, AR, I, MA:

AR(p) stands for the autoregressive model, the p parameter is an integer that confirms how many lagged series are going to be used to forecast periods ahead.

I(d) is the differencing part, the d parameter tells how many differencing orders are going to be used to make the series stationary.

MA(q) stands for moving average model, the q is the number of lagged forecast error terms in the prediction equation.

Correlation measures the strength of relationship between two variables. We can use the Pearson’s correlation coefficient to summarize the correlation between the variables. There can be positive , negative or zero correlation between variables. I time-series, we use auto-correlation, which means correlation with self.

Auto Correlation Function (ACF)

To use it, we find correlation with all lag values (lag1, lag 2 and so on)

Then we plot all of the correlation values against the lag values. x-axis has lag values and y-axis has correlation values. This graph is called ACF(Auto-Correlation Function)

It works well when we are doing Moving Average model. Confidence intervals are drawn as a cone. By default, this is set to a 95% confidence interval, suggesting that correlation values outside of this code are very likely a correlation and not a statistical fluke.

It works well with Moving Averages when we are trying to find relationship between lags of residuals. But while using Auto-Regression, there is some ambiguity with ACF plot. Eg: Lag2 has some impact on Lag1 variable, so the impact of Lag2 variable was accommodated in Lag1 . This impact should be removed and that’s what we call PACF

Partial Auto Correlation Function (PACF)

The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags.

The autocorrelation for an observation and an observation at a prior time step is comprised of both the direct correlation and indirect correlations. These indirect correlations are a linear function of the correlation of the observation, with observations at intervening time steps.

It is these indirect correlations that the partial autocorrelation function seeks to remove.

It gives clearer view of k (no of lag values to be used in auto-regression)

SARIMA (Seasonal Auto Regression Integrated Moving Average)

The biggest shortcoming of ARIMA model is that it can handle only trends but cannot handle seasonality in data. SARIMA can handle both trends as well as seasonality. We can remove seasonality by differencing (by subtracting corresponding values from few time steps back. eg: If we have monthly seasonality, we can subtract previous year sales value to this year sales value)

parameters for SARIMA:

P: Seasonal autoregressive order

D: Seasonal difference order

Q: Seasonal Moving Average Order

m: Period of seasonality. For monthly, seasonality its 12, for weekly, its 7.

parameters for ARIMA as well as SARIMA:

p: Trend autoregressive order

d: Trend difference order

q: Trend moving average order

SARIMAX (Seasonal Auto Regression Integrated Moving Average Exogenous)

We can use other variables to predict time series data. Example, To predict stock prices, we may use other variables like: Volume traded, opening price, Promotions etc. The only this to be noted is that in SARIMAX, we make date as index and not as separate column.