Statistical Modelling and Projection of Mortality for Australia

S.D. Constantinou

Faculty of

Social, Human and Mathematical Studies University of Southampton, Highfield, Southampton,

SO17 1BJ

Abstract

The

purpose of this study is to establish which stochastic mortality models out of

the following three Lee-Carter, Renshaw-Haberman and the Cairns-Blake-Dowd has the best goodness of fit for Australia’s

male and female population. The data used was separate for males and female and

had details of Australians aged between 55 and 90 for the years 1921 to 2014. In

addition, the study will investigate which mortality model has the most

accurate forecasting ability. This will

mainly be done using StMoMo, 1 a package in R, the package is

capable of analysing Australia’s data and produce a range of graphs indicating

whether the morality model performs well in these areas. The finding of this

study highlight that the Renshaw-Haberman produces the best goodness of fit against

this data. However, for the models forecasting ability the results were not as

clear cut as no model outperformed the other two.

1. Introduction

1.1 Objective

In recent years, researchers have observed a decrease in

mortality rates with the passage of time. Trying to capture and represent this

observed decrease, various authors have proposed statistically based mortality

models, which they then use to model and forecast the improvements to mortality

in the future. Predicting the mortality rates of the future is essential, as

the changes have already had social and economic implications, and will

continue to do so. Therefore, having a full understanding of the best

ways to model and forecast mortality is of great importance.

The aim is to investigate the applicability of the Lee-Carter, Renshaw-Haberman and the Cairns-Blake-Dowd mortality

models to Australia’s mortality data for males and females aged between 55 and

90 from the years range of 1921 to 2014. The data for males and females was retrieved

separately from the Australia section of the Human Mortality Database 2.

A comparison on each model’s capability to capture the improvements in mortality will be completed, this will be done

by fitting each model to the Australian data, then evaluating its goodness of

fit using statistical methods. When comparing how multiple models perform in

relation to each other there are restrictions to the comparison as each model

has different conditions where they work best. Furthermore, an analysis on each model’s

ability to predict the future distribution of mortality for Australia will be completed.

This will be assessed by back testing, meaning a forecast will be produced from

a time point in the past on data that is already available, then statistical

methods will be used to determine how accurate the forecast was. confidence intervals on the future projected

distribution.

1.2 Background

Australia’s reductions in mortality have occurred due to a vast range of

factors, such as healthcare advances and increases in education. Similarly,

countries of similar affluence and development have experienced comparable

changes to reduce mortality rates. The key factors to these reductions are improvements

to medical care, mainly due to new technological advances, as well as improved

understanding of treatment, hence increasing the survival rates for illness and

disease. Specific examples that have a sizable impact include: the elderly

surviving cardiovascular

disease 3 and the prevention of death caused by premature birth

3 and pneumonia in infants. As previously mentioned, education also

plays an important role in mortality. The reasoning behind this is that an

increased level of education often leads to financial security, allowing the

person to have a more affluent lifestyle and to invest into preventive measures

such as vaccination.

The consequences of the

decreasing mortality rate have meant that there has been a surge in life

expectancy. This has led to the social impact of an ageing population; many other

countries of an equivalent wealth and development as Australia have also

experienced this with their own population demographics. The significance of

Australia’s ageing population is reinforced by the statistic that in 1968

Australia’s over 65’s made up 8% of its population, 4 compared to

the 15% recorded in 2016 4. The transformation in the demographics

has caused implications on Australia’s economy, as the amount of money that the government and companies spend on

pensions has increased dramatically. The Australian government has already

introduced the idea of a policy 4 that tackles this spending

increase, raising the retirement age from the current age of 65 4.

Thus, reinforcing that the issue is significant and demonstrates the need for

heavy research into modelling and forecasting mortality in Australia. Benefits

from finding a mortality model that produces accurate forecasts for mortality

would aid businesses offering staff pensions, as well as the Australian

government. The reason for this is it will allow the cost of providing pensions

in the future to be more accurately assessed as improvements to the

understanding of longevity will be built into calculations, and

thus would enable more precise estimation of the contributions required to fund

such future pensions.

2. Statistical Mortality Models

2.1 Lee-Carter

The first model that I will be evaluating is the

1992 Lee-Carter model 5, this model is the most well-known

mortality model. Lee-Carter is a two-factor model 6; the factors

it considers when modelling and forecasting are age and time/period 6.

The fact that the model added this second factor of time/period meant that the

Lee-Carter could be used to spot age patterns 6, which previously

was a disadvantage of one-factor models. The model structure is as follows:

where is the central mortality rate at age in year 6. The term is average

log-mortality at age . The and the terms are estimators, originally

formed from Singular Value Decomposition 6 which is applied to find the least square solution. Nowadays the and the terms are calculated using a GLM approach 7. The term measures the response at age to change in the overall level of

mortality over time 6. Whilst, represents the overall level of

mortality in year, is the error term at each age time 6. For the Lee-Carter model it

is also assumed that is a random walk with drift 8

where,

The term monitors trends and is known

as the drift parameter 8, whereas is new error term 8. In addition, there is two constraints on

the and terms that help ensure the model uniqueness; the

constraints are such that the and 6. The

Lee-Carter model has been used by many others since its early release to

develop many other mortality models.

2.2 Renshaw-Haberman

The 2006 Renshaw-Haberman

model 9 is an extension of the Lee-Carter model. The model differs

from Lee-Carter as it includes the cohort effect 6 as an additional factor, and thus making it a three-factor model. The

cohort effect determines whether there is a relationship between the date of

birth and the mortality rate. This is useful as it can the help to recognise

the associated health risk that people born within a given date range may

encounter. This advantage comes with a price as the third factor makes the

model more complex. The Renshaw-Haberman model structure is as follows:

where the term has the same meaning as the in the Lee-Carter model 8, and is the average mortality at each age . The term is

the age interaction with time at each age and is the

age interaction with the cohort at each age . It has

been decided that due to the complexity of having varying in R that the term will be treated as a constant with a

value of 1. The term accounts for the

period effect whilst accounts for the cohort effect 8. Finally, is also

the same as Lee-Carter for this model and is the error term value at each age and year 10. The model

takes the and parameters to model two random walks with drifts 8 where:

and

The and terms monitor trends and is known

as the drift parameter 7, whereas and are error terms 8. The Renshaw-Haberman model, like the Lee-Carter, has the same two

conditions on and 8. Moreover, there are a further

two conditions on the and parameters,

such that and 10. These

conditions were applied to

avoid identification problems 10 whilst not having a negative impact on how

well the model functions.

2.3 Cairns-Blake-Dowd

The 2006 Cairns-Blake-Dowd model, 11 also known as the CBD model was the first of many

Cairns models. The CBD model, is a three-factor model which include the

age-time effect. The addition of the age-time effect 12 means the

model will produce cleaner residual plots 12 and it is also

another layer of protection that prevents important factors being missed or

unexplained. A strength with the model is that the three-factor model creates a

non-trivial correlation structure 12. The non-trivial correlation

structure means the CBD model captures the fact that at the older ages (55

plus) the function is essentially linear in age. This therefore why the CBD

model is not applicable at the younger ages as this observed feature of the function

being effectively linear in age is not present. The Cairns-Blake-Dowd model specification is as

follows:

where and

the term is the mean age of the considered range

of ages 13. denotes the effect of the general time and denotes the age-specific time effect 12. Finally, like the other two models, the

is the error term. The and parameters are the time indexes, 13 where:

and.

where and are

drift parameters like with the other models. and are normal variables 8. Unlike

the Lee-carter and Renshaw-Haberman, the CBD model doesn’t have any constraints

added to its parameters; the reason for this is that there is no identifiability problem for

this model 10. After the CBD model was released, an extension to

the model was released a few years later that accounted for the cohort effect;

however, in this study I will only be investigating the original model.

3. Data and Software

The data collected from the Human Mortality database was

separate for males and females and had information for Australians aged 55-90

from years 1921-2014. To

retrieve any data from the Human Mortality Database, it is required to sign up

and create an account. To import this data into R, installation of two packages

is necessary: StMoMo and Demography 14. Once installed, the

following code displayed below allows R to access an individual’s Human

Mortality Database account to collect data:

StMomodata <- hmd.mx(country = 'AUS', username = 'your username', password = 'your password', label = 'Australia'). Clearly the code needs to be changed slightly so it contains the information that applies to an individual accounts details. The individual country code for Australia is AUS 14 and the hmd.mx function is from the demography package. We can then transform the data for Australia's male and female population to a form recognised by StMoMo using the following code: AusMStMoMo <- StMoMoData(StMomodata, series = "male") AusFStMoMo <- StMoMoData(StMomodata, series = "female") Once this code has been inserted into R the software is fully setup. The purpose of the StMoMo package is to define many of the stochastic mortality models proposed to date as generalized (non-)linear models 1. StMoMo estimates the parameters of the mortality models in the same way it would for a GLM and then provides tools to fit the models to data 1.