The Quality of Demographic Data for Extreme Old Age is Terrible, but this Doesn’t Matter

Given the environment and state of medical technology over the past century, few people now live to be 100 years old. With yearly mortality rates approaching 50%, very few make it to 110. But in fact we don't know anywhere near as much as might be expected about the demographics of extreme old age. The data is near uniformly terrible in quality, the result of a number of factors. Firstly the state of century-old records is poor even in the most developed countries, so verifying age and identity can be costly and challenging. Secondly, a number of perverse incentives exist to produce incorrect data. Pensions fraud, for example, ensures that dead people remain alive in databases. Further there is a certain status to being extremely old in most societies, so some old people exaggerate their ages, enabled by poor or missing records.

This is all of great interest to demographers, the question of bad data and how to compensate for the problems or fix them. But from the perspective of what we intend to do about aging, whether we choose to earnestly treat aging as a medical condition, how researchers might develop rejuvenation therapies based on repair of cell and tissue damage, the present demographics of extreme old age do not matter. At all. The research community knows more than enough about the mechanisms of aging to work on potential rejuvenation therapies. Improving the demographics of late life survival will add very little to that body of knowledge.

The global pattern of centenarians highlights deep problems in demography

The measurement of human ages, and by extension age-specific rates of any quantity, relies almost universally upon a single measurement system: the globally-incomplete paperwork-based system of documentary evidence known as vital registration. Despite age being the single most important risk factor for human health, along with gender, there has been no accurate and independently metric to validate human age measurements. If a developmentally mature person walks into a clinical setting with no paperwork, for example, there has been no independent or reproducible test available to measure their chronological age. As such, if age-based paperwork consistently records an incorrect age, there is no method by which that error can be detected because there is, or rather has been, no independently reproducible scientific method available for discovering such errors.

As a result, globally diverse document-based systems of vital registration are not subject to any document-independent technical validation or calibration. Systematic errors or error-generating processes that modify age records, from heavily biased or systemic errors to simple typographic mistakes, can therefore remain undetected indefinitely. Despite some scepticism on the reliability of age data this situation has been long ignored: first on the basis of an untestable assumption that such errors must be rare, and second on the seemingly reasonable statistical grounds that - if vital registration errors are assumed to be sufficiently rare and random - they may be safely ignored by fitting random error terms within a statistical model.

Recent theoretical work has shown that neither case seems to be a valid assumption especially at older ages. In survival processes, age-coding errors accumulate non-randomly with age - even when initial rates of error are vanishingly low, symmetrically distributed, and random - through a process that can substantially distort late-life data and massively inflate the frequency of errors at certain ages.

The underlying theoretical reason is simple. Consider, for example, a population of one million fifty-year-old people, into which a hundred 40-year-olds are accidentally included through age-coding errors: an initial error rate of 0.01% or one in every ten thousand. The paperwork of these 40-year-olds accidentally records them as aged 50 years - a surprisingly common mistake - and these 'young liar' errors appear, officially and on paper, as 50-year-olds. As the two cohorts age, the 'young liar' errors are less than half as likely to die as the actual 50-year-olds - because they are biologically 10 years younger - and errors therefore constitute a growing fraction of the population with age. In typical human populations, error rates will grow at an approximately exponential rate with age due to the better survival of 'young liars.' By age 85 more than half of the population becomes errors, by age 100 'young liar' errors constitute the entire population: a kind of error explosion caused by the asymmetrically better survival of 'young liars.'

Combined with the historical lack of paperwork-independent methods to validate and correct paper records, this simple theoretical process raises an uncomfortable possibility: that extreme age records may be dominated by undetected errors. Analysis of 236 nations or states across 51 years reveals that late-life survival data is dominated by anomalies at all scales and in all time periods. Life expectancy at age 100 and late-life survival from ages 80 to 100+, which we term centenarian attainment rate, is highest in a seemingly random assortment of states. The top 10 'blue zone' regions with the best survival to ages 100+ routinely includes Thailand, Kenya and Malawi - respectively now 212th and 202nd in the world for life expectancy, the non-self-governing territory of Western Sahara, and Puerto Rico where birth certificates are so unreliable they were recently declared invalid as a legal document. These anomalous rankings are conserved across long time periods and multiple non-overlapping cohorts, and do not seem to be sampling effects. Instead these patterns suggest a persistent inability, even for nation-states or global organisations, to detect or measure error rates in human age data, with troubling implications for epidemiology, demography, and medicine.

Fight Aging!

The Quality of Demographic Data for Extreme Old Age is Terrible, but this Doesn't Matter