HALD, a Human Aging and Longevity Knowledge Graph

For those who find use in such things, HALD is an interesting tool for exploration of the literature surrounding particular genes, proteins, lipids, and other molecules. The authors mined the literature and determined relationships between these various items, as well as their roles as biomarkers. At the high level, the life sciences find themselves afloat on a sea of data. It costs little to generate ever more data, and much more to try to analyze it, so the pace at which databases grow is somewhat faster than the pace at which various groups are organizing, analyzing, and obtaining useful insights from that data.

Human aging is a natural and inevitable biological process that leads to an increased risk of aging-related diseases. Developing anti-aging therapies for aging-related diseases requires a comprehensive understanding of the mechanisms and effects of aging and longevity from a multi-modal and multi-faceted perspective. However, most of the relevant knowledge is scattered in the biomedical literature, the volume of which reached 36 million in PubMed.

Currently, there are some publicly online databases related to human aging and longevity. However, to the best of our knowledge, these databases are all manually curated, making it difficult to incorporate comprehensive knowledge of human aging and longevity. It is also difficult to obtain the latest biomedical knowledge from manually curated databases as their services are out of maintenance or not updated in time. In addition, although human nucleic acids information is generally involved in these studies, knowledge of other important organic compounds like carbohydrates, lipids, and proteins is not yet fully integrated.

Here, we presented HALD, a text mining-based human aging and longevity dataset of the biomedical knowledge graph from all published literature related to human aging and longevity in PubMed. HALD integrated multiple state-of-the-art natural language processing (NLP) techniques to improve the accuracy and coverage of the knowledge graph for precision gerontology and geroscience analyses. Up to September 2023, HALD had contained 12,227 entities in 10 types (gene, RNA, protein, carbohydrate, lipid, peptide, pharmaceutical preparations, toxin, mutation, and disease), 115,522 relations, 1,855 aging biomarkers, and 525 longevity biomarkers from 339,918 biomedical articles in PubMed.

Link: https://doi.org/10.1038/s41597-023-02781-0

Comment Submission

Post a comment; thoughtful, considered opinions are valued. New comments can be edited for a few minutes following submission. Comments incorporating ad hominem attacks, advertising, and other forms of inappropriate behavior are likely to be deleted.

Note that there is a comment feed for those who like to keep up with conversations.