Insciter Insights: Exploring the Human DNA Library

Carlos de Rojas

Freelance Life Sciences journalist supporting technological and scientific solutions through content creation and international expansion.

Two decades after the Human Genome Project

The Human Genome Project has profoundly impacted biological research, boosting new tools to obtain and analyse genomic data, fostering more international collaboration and more distribution of research data among scientists, as well as accelerating the progress of knowledge about human health and diseases. On the other hand, these advances have not delivered dramatic improvements of common diseases treatments, as complex regions of the genome still need to be understood.

The human genome has three billion base pairs of DNA, but only around 2% of them encode proteins. Researchers have already sequenced the human genome, and know where genes are. But actually that was just the beginning. In the last years, they have taken steps in mapping gene functions, or understanding noncoding DNA, such as introns or retrotransposons, and noncoding RNAs. It seems like there is always another biological layer to pass through, nonetheless more initiatives have been filling these gaps across the genome.

Some of them are:

The Telomere-to-Telomere (T2T) Consortium

The Telomere-to-Telomere (T2T) Consortium finished the first complete 3.055 billion base pair sequence of a human genome, representing the largest improvement to the human reference genome since its initial release. As its website says, “the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no one chromosome has yet been finished end to end” and explains, “these unresolved regions include segmental duplications, ribosomal rRNA gene arrays, and satellite arrays that harbour unexplored variation of unknown consequence.”

The UK 100.000 Genomes Project

The UK 100.000 Genomes Project is investigating the role of genome sequencing in patients with undiagnosed rare diseases and the coordination of this research with health care implementation in the National Health Service. Other parts of this project focus on patients with cancer and infection.

The Human Pangenome Reference Consortium

The Human Pangenome Reference Consortium is a more-inclusive genome project, aiming to capture all of human diversity, contributing to address the racial and ethnic biases in genomic resources.

Famous genes

The World Health Organization estimates that there are about 10,000 different types of single-gene diseases or monogenic diseases. However, a lot of illnesses can involve lots of different genes, influenced by lifestyles and environmental factors. Common health problems such as heart disease or obesity do not have a single genetic cause.

Although there are around 20,000 protein-coding genes in the human genome, only 100 account for over one-fourth of all the publications tagged in NLM (National Library of Medicine).

Commonly investigated genes:

TP53

TP53 is the most studied gene in research, known for its role as a tumour suppressor. It initiates cell death, preventing a damaged cell from reproducing, also known as the ‘guardian of the genome’, is mutated in nearly 50% of all human cancers.

VEGFA

VEGFA encodes vascular endothelial growth factor A, a protein that promotes the growth of blood vessels.

TNF

TNF encodes tumour necrosis factor, an immune molecule that has been a major drug target for inflammatory disease.

APOE

APOE has an important role in cholesterol and lipoprotein metabolism.

Well-known genes searching by disease

– Mutations in PSEN1 and PSEN2 are detected in early Alzheimer’s Disease.

– BRCA1 and BRCA2 are examples of genes that raise cancer risk if they become altered, especially for breast cancer and ovarian cancer.

– Mutations in the HTT gene causes Huntington’s disease.

Specific genes can be the first step to understand a disease

– The GJB2 gene, implicit in the functioning of the cochlea. It is associated with hearing loss.

– The HLA-DRB1 gene is the strongest known genetic risk factor for arthritis rheumatoid.

– The ABCA4 gene is being studied to improve the diagnosis of retinal dystrophies.

Why are some genes much less explored than others?

There are a variety of reasons that explain why some genes have been receiving more attention. In a large scale investigation published in PLOS Biology, Thomas Stoegerwhy and his team, reached some conclusions about this issue:

– Genes that express protein or are related to tissues in the body are easier to study, offering more material to put through an assay.

– Many researchers are more interested in genes that are more likely to have big impacts in the body when they are mutated or disabled.

– Past research priorities strongly impact current initiatives: young researchers need to justify future studies and be productive within restricted times given.

– Junior scientists that study little-studied biology have a lower chance to become a principal investigator, because it’s harder for them to get funding.

Special genetic traits

We share around 99 percent of genetic information with any human, also with chimps. But in that less than 1 percent, lots of possibilities are hidden too. For example, the US company Variant Bio is focused on human genetic diversity to discover new therapeutics, searching the globe for people who are medical or genomic outliers in order to identify the genes that make them unique, from altitude-adaptation genes of Tibetans to cholera-resistance genes of citizens from Bangladesh. With the mission of genotyping populations that are less represented in genetic studies, the company is also engaging with several communities who donate genetic samples, offering them a share of its revenue.

It also should be noticed that other specific genetic variations would be the closest version of human with “superpowers” in real life, such as:

– A loss-of-function mutation of SCN9A leads to a congenital insensitivity to pain.

– People with a mutation of DEC2 gene need to sleep less hours than the average.

– A mutation of LRP5 gene is related to people with higher density bones.

Where to find investment opportunities in Genomics?

As Kemal Malik, member of the Bayer board of management responsible for innovation, commented in an interview to National Geographic, Genomics “is revolutionizing people’s ideas not just of health care, but of illness itself.”

The pace of sequencing, analysing, testing and editing genes will continue its rise in the following years. Currently, there are hundreds of startups offering different methods and promises, competing for a piece of the market. Step by step, we are transitioning from the genomic-reading era to the genomic-writing era. But to reach that promising and, at the same time, perturbing future, firstly these are some of the areas or advances where experts have more expectations:

– Consumer genetic testing companies will deal with more demanding clients and the access to other worldwide regions still unexplored.

– CRISPR/Cas9-based tools will progress faster thanks to applications in other fields such as agriculture or energy solutions.

– Blockchain could be the best answer to genomic data security.

– Portable sequencing devices will have more presence in different industries and become cheaper.

– Understanding the interconnections of thousands of genes expressed together will give more precise insights to the hardest questions in medicine, especially in cancer or Alzheimer’s.

– Integration of genomics data with information from other sources, such as imaging or proteomics, will create multidimensional landscapes of genotypes to better understand biological processes.

Cost per genome data (Source: National Human Genome Research Institute)

Combining genomic databases

Discover and explore different genes in the following platforms:

– MedlinePlus: includes information on more than 1300 genetic conditions.

– Genbank: a comprehensive database that contains publicly available nucleotide sequences.

– Clinical Genome: clinical relevance of genes and variants for precision medicine.

– HuVarBase: a human variant database with information, with near 800.000 variant records.

– Ensembl: a genome browser for vertebrate genomes.

– IGSR: data type and population diversity the resources from the 1000 Genomes Project.

– COSMIC: the Catalogue of Somatic Mutations In Cancer.

– SFARI Gene: a database for the autism research community.

– LongevityMap: a database of human genetic variants associated with longevity.