200,000 whole genomes made available for biomedical studies by U.K. effort

In the largest single release of whole genomes ever, the UK Biobank (UKBB) this week unveiled to scientists the entire genomes of 200,000 people who are part of a long-term British health study.

The trove of genomes, each linked to anonymized medical information, will allow biomedical scientists to scour the full 3 billion base pairs of human DNA for insights into the interplay of genes and health that could not be gleaned from partial sequences or scans of genome markers. “It is thrilling to see the release of this long-awaited resource,” says Stephen Glatt, a psychiatric geneticist at the State University of New York Upstate Medical University.

Other biobanks have also begun to compile vast numbers of whole genomes, 100,000 or more in some cases (see table, below). But UKBB stands out because it offers easy access to the genomic information, according to some of the more than 20,000 researchers in 90 countries who have signed up to use the data. “In terms of availability and data quality, [UKBB] surpasses all others,” says physician and statistician Omar Yaxmehen Bello-Chavolla of the National Institute for Geriatrics in Mexico City.

The whole truth

A number of efforts are releasing many thousands of whole genomes, with varying degrees of access, to accelerate biomedical research.

Biobank Completed whole genomes Release information
UK Biobank 200,000 300,000 more in early 2023
Trans-Omics for Precision Medicine 161,000 National Institutes of Health (NIH) requires project-specific consent
Million Veteran Program 125,000 Non–Veterans Affairs researchers get access in 2022
Genomics England’s 100,000 Genomes 120,000 Researchers must join collaboration
All of Us 90,000 NIH expects to release by early 2022

Having enrolled 500,000 middle-age and elderly participants of mostly European ancestry from 2006 to 2010, UKBB is one of the largest genetics research databases in the world. It proved its worth even before releasing whole genomes. Studies of specific DNA markers that vary among participants have revealed hundreds of new disease risk genes. Since 2019 researchers have also been probing participants’ exomes, the 2% of the whole genome sequence (WGS) that encodes proteins; the exomes from nearly all participants became available in the past 2 months. Exome studies are yielding risk genes that are very rare and can’t be found with genotyping data.

But whole genomes will make it possible to explore the influence of noncoding DNA, which controls when genes are turned off or on, and of gene rearrangements, as well as missing, repeated, or extra stretches of DNA in genes. Such changes play a role in diseases such as Huntington disease.

Iceland’s deCODE genetics, now a subsidiary of Amgen, sequenced most of the 200,000 UKBB whole genomes released this week as part of an industry consortium that is helping cover the £200 million costs of the sequencing. The companies, eager to find drug targets, got a 9-month head start on using the WGS data. DeCODE reported initial findings this week in a bioRxiv preprint, among them variants in noncoding regions that influence height and the onset of puberty in girls.

Although researchers tapping into UKBB once had to download massive data sets onto their own computers, they can now log into a secure cloud-based computing environment. That will make it easier to collaborate and integrate different types of clinical and genetic data, UKBB says. Bello-Chavolla’s team expects to probe the newly released genomes and related health data for clues to metabolic diseases and aging, then follow up with studies of those genes in Mexicans. “This type of data availability is crucial for researchers in low- and middle-income countries,” he says.

In the United States, the Million Veteran Program now has WGS data for about 125,000 of its nearly 850,000 participants; it hopes to begin to open the data to researchers outside the Department of Veterans Affairs next year, says J. Michael Gaziano of the VA and Brigham and Women’s Hospital who is a project principal investigator. Another U.S. project, Trans-Omics for Precision Medicine (TOPMed), has 161,000-and-counting whole genomes pooled from smaller studies. But unlike UKBB, which has permission from participants to use the data for a wide range of studies, TOPMed requires researchers using the data to get new consent for specific projects.

The National Institutes of Health’s All of Us study, which aims to enroll 1 million volunteers and expects to release 90,000 whole genomes this winter, hopes to emulate the UKBB model by making data widely available to approved scientists through a cloud-based web interface. The U.S. biobanks also have a big advantage over UKBB: They are much more ethnically diverse. “There are strengths to both efforts,” says geneticist Laura Raffield of the University of North Carolina, Chapel Hill, who studies the genomics of blood cell traits using data from both TOPMed and UKBB.

source: sciencemag.org