Mortality Risk Analysis in a Dataset of Half a Million People

The UK mortality risk study I point out below doesn't provide any real surprises when it comes to the risk factors associated with higher mortality rates at a given age, but taken as a whole it is a good example of the present trend towards much more data and far larger study sizes in epidemiology. In this age of databases, with the cost of storage and computation falling rapidly towards numbers barely distinguishable from zero, the quality of epidemiological analysis is increasing. More data and larger study populations bring the possibility of ever better statistical measures, the ability to identify more subtle correlations, and - perhaps of greatest interest for those of us not in the science business - online databases that allow everyone to jump and and look at the results.

So you should head over to the UK Longevity Explorer and take a look at the Association Explorer; it's an interesting tool to tinker with, especially once you start digging down into the weeds of smaller associations. It is a nice view of all the things we'd like to render entirely irrelevant by producing rejuvenation biotechnologies capable of repair of cell and tissue damage. In a world in which the causes of aging can be meaningfully addressed, it no longer matters that you have minor gene variants, or had more or less exposure to infectious diseases in youth, or experienced other circumstances that presently swing life expectancy a year or a few years in either direction. The benefits provided by repair therapies will vastly outweigh all of that when it comes to long term health and life expectancy.

On a slightly different topic, and unlike the study below, I suspect that the largest datasets of interest to aging research that emerge in the decades ahead will be obtained without the consent of study participants. The incentives align with this outcome: (a) all groups with the capability to gather large amounts of data are presently doing so rapaciously, since they can use that data to generate profits in many ways; (b) few organizations are any good at defending large databases from attackers; (c) a dataset released into the wild from legal jurisdiction A is a dataset that researchers in legal jurisdiction B don't have to do the work to assemble or otherwise pay to use.

Given these points, I think that we will see continuing theft and release of large sets of medically relevant data, and that researchers and their boards will concoct ethical justifications for using this data as becomes more widely available. For example, researchers might pay a third party to anonymize stolen datasets available online in a way that prevents records from being associated with individuals without disturbing statistical associations, and then never officially view the original data themselves. There will be a sense that it is a shame to let this all go to waste since it is out there.

5 year mortality predictors in 498,103 UK Biobank participants: a prospective population-based study

Participants were enrolled in the UK Biobank from April, 2007, to July, 2010, from 21 assessment centres across England, Wales, and Scotland with standardised procedures. In this prospective population-based study, we assessed sex-specific associations of 655 measurements of demographics, health, and lifestyle with all-cause mortality and six cause-specific mortality categories in UK Biobank participants using the Cox proportional hazard model. We excluded variables that were missing in more than 80% of the participants and all cardiorespiratory fitness test measurements because summary data were not available. Validation of the prediction score was done in participants enrolled at the Scottish centres. UK life tables and census information were used to calibrate the score to the overall UK population.

Of 498,103 UK Biobank participants included (54% of whom were women) aged 37-73 years, 8532 (39% of whom were women) died during a median follow-up of 4ยท9 years. Self-reported health was the strongest predictor of all-cause mortality in men and a previous cancer diagnosis was the strongest predictor of all-cause mortality in women. When excluding individuals with major diseases or disorders (Charlson comorbidity index greater than 0; n=355 043), measures of smoking habits were the strongest predictors of all-cause mortality. The prognostic score including 13 self-reported predictors for men and 11 for women achieved good discrimination and significantly outperformed the Charlson comorbidity index.

Measures that can simply be obtained by questionnaires and without physical examination were the strongest predictors of all-cause mortality in the UK Biobank population. The prediction score we have developed accurately predicts 5 year all-cause mortality and can be used by individuals to improve health awareness, and by health professionals and organisations to identify high-risk individuals and guide public policy.

UK Longevity Explorer

Interest into the causes of death and disease is growing, as is our knowledge and understanding. Individuals, healthcare professionals, researchers, health organisations and governments all want to understand more about what might improve or reduce life expectancy, particularly in the middle-aged and elderly.

A large-scale project called UK Biobank was set up, and between 2006 and 2010, it collected 655 measurements from nearly half a million UK volunteers (498,103) aged 40-70. This website presents the two main parts of the researchers' work: the Association Explorer and the Risk Calculator. These are closely connected - the Risk Calculator is based on findings from the Association Explorer.

The Association Explorer is an interactive graph where you can explore how closely 655 measurements (variables) from the UK Biobank study are associated with different causes of death. The results for different associations are presented separately for women and men, and illustrate the ability of each variable to predict mortality. For more detailed results for each specific measurement, you can click on each dot (data point). You can also select groups of measurements, different causes of death, as well as search for a particular variable of interest using the search bar.

As questionnaire-based variables were found to be the strongest predictors, the researchers created a calculator that could use questionnaire answers to predict an individual's risk of dying within five years ('five-year risk'). To do this, they used a computer-based approach to automatically select the combination of questions from UK Biobank that gave the most accurate prediction of death within five years.

Comments

From a medical perspective, this stuff is already irrelevant. This sort of data only potentially matters from a public-policy and social-science perspective.

Also, looking through this data, nearly all of it is "Well, yeah, obviously." Severe medical problems, being poor, and doing things known to be harmful all correspond with shorter lifespans? Color me shocked!

"unlike the study below, I suspect that the largest datasets of interest to aging research that emerge in the decades ahead will be obtained without the consent of study participants."

I doubt it. Not because of any ethical reason, but because large datasets in general aren't particularly useful for transformative technologies. The information you can glean from the bodies of a handful of consenting humans is more worthwhile than a lot of statistical gobbledygook. You don't need to steal the medical information of a thousand people to know that they're all dying slowly in the same general ways, and if functional anti-aging therapies actually do what they're supposed to, you won't need n>50 (or n>1, really) to show that they do.

Seriously, Reason; can you find any use for any large-scale data grab that is at all relevant to what SENS and its offshoots do?

Posted by: Slicer at July 21st, 2015 7:05 PM

@Slicer: When saying epidemiology was of interest to aging research I meant the broader swathe of aging research, largely interested in mapping the details of aging, and very tied up in public policy discussions, not SENS. Agreed, that was ambiguous in context, and yes, the small details are not really important from a repair perspective because everyone is aging due to the same root causes. Effective treatments won't be personalized at all I imagine.

Posted by: Reason at July 21st, 2015 8:18 PM

@Slicer: while the variables that do discriminate are not surprising, some of the ones that don't might be (eg BMI, hip circumference, trunk fat percentage, fluid intelligence) (I'm looking at the men's data).

Posted by: ale at July 21st, 2015 10:49 PM

Reason: Okay, gotcha.

The commonalities (signaling environment and such) don't need to be personalized, but stem cell replacement absolutely needs to be DNA-personalized at some point; I pointed out some months ago that you eventually have to insert stem cells with repaired/near-perfect DNA, specific to that particular person. (If you have fast transcription and hopefully reverse transcription, error correction is the easiest thing in the world on any computer.) Hopefully, creating these cells is something a machine in a clinic will be able to just do.

Ale: BMI, hip circumference, and trunk fat percentage are surprising? Why? We've known for decades that being fat dramatically increases your risk of diseases that kill you, most notably diabetes. Low fluid intelligence is a probable sign of neurodegenerative disease, which kills you. Being stupid (no matter how you got that way) can also kill you.

Posted by: Slicer at July 22nd, 2015 9:10 AM

@Slicer: I think you misunderstood what I wrote. Those variables that I mentioned did NOT raise mortality. From your reply, it should be clear why they are surprising.

Posted by: ale at July 22nd, 2015 9:34 PM

Ale, for the first three, you're wrong, at least according to what I'm seeing. (I'm copypasting tables here, beware of bad formatting)

Hip circumference:

(101,106] Reference 1244 Reference
[30,96] 1.5 [1.4-1.6] 986 2.66e-21
(96,101] 1.1 [1.0-1.2] 1279 0.00687
(106,115] 1.3 [1.2-1.4] 1242 2.12e-08
(115,195] 1.9 [1.7-2.1] 473 1.67e-31

BMI:

(28,31.2] Reference 1237 Reference
[12.8,23.6] 1.2 [1.1-1.4] 786 2.40e-06
(23.6,25.8] 0.9 [0.9-1.0] 933 0.153
(25.8,28] 0.9 [0.8-1.0] 1084 0.0289
(31.2,68.4] 1.4 [1.3-1.5] 1184 8.03e-15

Trunk fat percentage:

[2,24.6] Reference 1308 Reference
(24.6,29.2] 0.9 [0.8-1.0] 1204 0.0239
(29.2,33.2] 1.0 [0.9-1.1] 1176 0.910
(33.2,38] 1.2 [1.1-1.3] 1047 1.24e-06
(38,77.6] 1.8 [1.6-2.0] 489 3.68e-28

Same story with body fat percentage. A little bit appears to be good; real obesity and it's a problem.

You were correct on fluid intelligence, though, which is surprising:

(4,6] Reference 1719 Reference
[0,4] 1.0 [0.9-1.1] 1304 0.806
(6,8] 1.0 [0.9-1.0] 1520 0.203
(8,13] 1.0 [0.9-1.1] 681 0.717

Posted by: Slicer at July 23rd, 2015 8:29 AM

Here they are for healthy men (which is what I was looking at, sorry):

BMI:
All (28,31.2] Reference 529 Reference
[12.8,23.6] 1.2 [1.0-1.3] 342 0.0267
(23.6,25.8] 1.0 [0.8-1.1] 448 0.445
(25.8,28] 1.0 [0.8-1.1] 515 0.444
(31.2,68.4] 1.3 [1.1-1.5] 404 0.000133

The reference class is into the obese category already, and its mortality is no higher than people at BMI of 24. You have to go far to get odds ratios that wouldn't even be worth mentioning usually. Also, hard to tell at what point since they grouped BMIs of 32 and 52 together (same as it's hard to tell what happens at the supposedly "normal" BMI of 22 because of the grouping with 15). If this was a measurement other than "fat", noone would bother mentioning it. It especially makes a mockery of the "overweight" category of BMI.

Hip circumference:

All (101,106] Reference 587 Reference
[30,96] 1.4 [1.2-1.6] 423 6.13e-07
(96,101] 1.1 [1.0-1.2] 598 0.114
(106,115] 1.1 [1.0-1.3] 486 0.0705
(115,195] 1.6 [1.3-1.9] 144 2.01e-06

U-shape, like the one you posted, where the lowest mortality is a high hip circumference, well into fat for most people. Again, you have to go into massive to get anything worthy of mentioning.

Trunk fat:

All [2,24.6] Reference 638 Reference
(24.6,29.2] 1.0 [0.9-1.1] 590 0.429
(29.2,33.2] 1.0 [0.9-1.1] 503 0.796
(33.2,38] 1.1 [1.0-1.3] 373 0.0801
(38,77.6] 1.6 [1.3-1.9] 134 2.86e-06

No effect till you hit over 33% and no good sized effect till >38%.

None of these effect sizes can be considered as "dramatically increasing" your mortality. For example they are much smaller effects than the "type of bread" or the "type of cereal" you eat, or the type of house you live in, and about the level as the type of coffee you drink or how often you eat chicken. I expect most people, even (or especially) in a site like this, to find the practically negligible effect sizes as surprising.

Posted by: ale at July 23rd, 2015 9:59 PM

I don't know why you'd select "in healthy individuals". "Healthy" was defined as "excluded UK Biobank participants who had any major disease or disorder before becoming involved in the study" - but since obesity is a contributing factor to some major diseases and disorders, removing that category is disinformative.

"For example they are much smaller effects than the "type of bread" or the "type of cereal" you eat, or the type of house you live in, and about the level as the type of coffee you drink or how often you eat chicken."

I'm going to go out on a limb and say that these questions are generally accurate descriptors of overall wealth, lifestyle choices, and sugar intake. For example, here's the coffee, all-cause mortality for everybody:

Not drinking coffee Reference 2549 Reference
Decaffeinated coffee (any type) 0.9 [0.8-1.0] 416 0.00789
Instant coffee 1.0 [0.9-1.1] 1799 0.834
Ground coffee (include espresso, filter etc) 0.7 [0.6-0.8] 412 8.79e-13
Other type of coffee 1.5 [1.2-2.0] 48 0.00328

I'll offer a straightforward explanation: people who are unemployed or otherwise low on money, or who have physical problems that encourage them to do things that require less effort, drink instant, and people who work in white-collar environments drink ground coffee. "Other" coffee is likely full of sugar.

Cereal? Sugary cereals (The reference!) correspond to higher mortality. Most straightforward explanation: people who eat more sugar pick up Type 2 diabetes faster. (Damn, when's the last time I had muesli...)

Bread? This is a UK survey so there are probably differences I'm not familiar with, but white bread has more calories and less nutrition than brown bread and may be made with more corn syrup (sugar). (I can't guess as to why people who don't eat any bread have such a higher risk... what are they eating instead?)

But, in general, the most plausible explanation is one we already know: high sugar intake raises your risk of things that lead to death.

And here's the relationship between body fat percentage and circulatory system related mortality:

[5,23.9] Reference 391 Reference
(23.9,28.7] 1.0 [0.9-1.2] 374 0.760
(28.7,33.7] 1.5 [1.3-1.7] 378 1.68e-07
(33.7,39.4] 2.2 [1.8-2.7] 172 1.01e-16
(39.4,69.8] 3.9 [2.8-5.5] 37 5.73e-15

Trunk fat percentage and circulatory system related mortality:

[2,24.6] Reference 292 Reference
(24.6,29.2] 1.0 [0.9-1.2] 307 0.763
(29.2,33.2] 1.1 [0.9-1.3] 300 0.198
(33.2,38] 1.5 [1.3-1.8] 286 2.71e-06
(38,77.6] 2.7 [2.2-3.3] 167 9.58e-24

And coffee type and circulatory system related mortality:

Not drinking coffee Reference 640 Reference
Decaffeinated coffee (any type) 0.9 [0.7-1.1] 105 0.170
Instant coffee 1.1 [1.0-1.2] 498 0.0973
Ground coffee (include espresso, filter etc) 0.6 [0.5-0.7] 89 5.30e-06
Other type of coffee 2.6 [1.6-4.0] 20 3.38e-05

Cereal type only slightly increases the risk of death by circulatory failure. Fat, sugary coffee, and sugary cereal slightly increase the risk of death by cancer.

Now, if you really want something that's totally inexplicable, check out the total all-cause mortality for alcohol drinking.

Three or four times a week Reference 1057 Reference
Daily or almost daily 1.3 [1.2-1.4] 1449 3.58e-09
Once or twice a week 1.2 [1.1-1.3] 1225 6.37e-06
One to three times a month 1.3 [1.2-1.4] 430 6.68e-06
Special occasions only 1.7 [1.5-1.8] 506 9.01e-21
Never 2.2 [2.0-2.4] 557 3.37e-49

I'm lost. Anyone with a plausible explanation for this one?

Posted by: Slicer at July 24th, 2015 8:02 PM

I agree that using Total population seems like a better choice. The numbers don't differ in a way that leads to significant differences in conclusions though:

Hip circumference is basically the same, U-shaped with a nadir at a very large measurement. BMI also no major difference until you go past BMI of 32. And trunk fat no major difference till you hit >38% fat

I also agree that the three that I posted are indicators of lifestyle and wealth. But so is BMI of > 32. My point was that those three have stronger correlations with mortality but noone talks about them as major risks (correctly I think).

I don't see a need to invoke the sugar factor. Note that income is a much better predictor of mortality than any of the ones we've mentioned here, so having any of these as noisy proxies for income could explain it all

Regarding alcohol consumption, it shouldn't come as a surprise by now. There are tons of results claiming the same thing. I suspect it's just another proxy for wealth and lifestyle, although I saw once a result claiming that alcohol inhibited growth of cells in the intima or media layers of the arteries, but I can't find the reference. Since the lower mortality among drinkers is almost completely due to lower heart disease, it sounded plausible.

Posted by: ale at July 25th, 2015 5:54 AM
Comment Submission

Post a comment; thoughtful, considered opinions are valued. New comments can be edited for a few minutes following submission. Comments incorporating ad hominem attacks, advertising, and other forms of inappropriate behavior are likely to be deleted.

Note that there is a comment feed for those who like to keep up with conversations.