Trailblazing Genomics

In November 2022, a review published by Nature ranked the most innovative countries worldwide, identifying top academic innovators in each. The UK, ranking third in Europe, saw the Wellcome Sanger Institute as head of the list. 

Dr Emmanuelle Astoul, Head of Translation at the Sanger Institute welcomed the news as a ratification of the Institute’s mission: “This recognition shows how foundational our work is, enabling others to advance on our science and develop products or services including therapeutics stemming from our research. We work with partners, including investors, to apply our own science to solve real-world problems, but we also maximise our impact by enabling others to innovate by sharing the resources we create.”

Dr Julia Wilson, Associate Director at the Sanger Institute, says: “We do not have the skills or resources to progress our science into the clinic, that is not our strength. Hence, we pass the baton on to others who can create diagnoses and treatments based on the data we openly share. We welcome the fact that this is happening and that our ambitious science is underpinning these efforts.” 

The review takes into account patent scholarly citation - how many times a scientific research paper is cited in patents - as well as academia-industry collaborations, among other parameters. Taking a glance at these data, it comes as no surprise that among the most cited papers in patents are the results of the international collaboration of the Human Genome Project, which is considered the biomedical research achievement of the 20th Century.

Professor Sir John Sulston presenting The Sanger Centre (now Wellcome Sanger Institute) to the press. Credit: Unknown

Professor Sir John Sulston presenting The Sanger Centre (now Wellcome Sanger Institute) to the press. Credit: Unknown

Professor Sir John Sulston and Dr Michael Morgan celebrating the sequencing of the first human genome. Credit: Uknown

Professor Sir John Sulston and Dr Michael Morgan celebrating the sequencing of the first human genome. Credit: Uknown

Item 1 of 2

Professor Sir John Sulston presenting The Sanger Centre (now Wellcome Sanger Institute) to the press. Credit: Unknown

Professor Sir John Sulston presenting The Sanger Centre (now Wellcome Sanger Institute) to the press. Credit: Unknown

Professor Sir John Sulston and Dr Michael Morgan celebrating the sequencing of the first human genome. Credit: Uknown

Professor Sir John Sulston and Dr Michael Morgan celebrating the sequencing of the first human genome. Credit: Uknown

Professor Sir John Sulston, the Sanger Institute’s founding Director and partner of the Human Genome Project, was present at the 1996 summit in Bermuda, where leaders of the scientific community agreed on a set of principles, (known as the Bermuda principles), requiring all DNA sequence data to be released in publicly accessible databases. This was endorsed by funders, thus becoming a norm in genomics. As Dr Julia Wilson puts it: “Genomics was founded on open principles from the outset, and this has continued today.”  

In the top five papers, there is also a 2002 paper on the discovery of the mutations of the BRAF gene and how these relate to skin cancer, one of the landmark papers of our previous director, Professor Sir Mike Stratton. His work and discovery were instrumental in the development of small molecule drug inhibitors targeting mutated BRAF genes, which radically changed the clinical treatment of melanoma and improved patient outcome.

Continuing to look at Nature’s Innovator index, most of the Sanger Institute’s most cited papers in patents provide new techniques for the advancement of genetic sequencing. A new sequencing technology for accurate, rapid and economic whole-genome re-sequencing is first up in paper-patent citation. A collaboration between sequencing company Illumina and the Sanger Institute, the paper describes the technology underpinning Illumina’s sequencing machines. “Sanger scientists were involved in running an evaluation to demonstrate the utility of the technology for sequencing human genomes,” says Dr Michael Quail, Principal Scientist at the Scientific Operations Department at the Sanger Institute. This is significant as this technology is now the main dominant high-throughput platform that nearly every sequencing lab in the world uses.”

Also among the top five most cited articles in patents, we find another novel technology: a bioinformatics programme called the Burrows-Wheeler Alignment tool. “When Illumina sequencing first launched, it was very revolutionary but overwhelming at the same time due to the enormous amount of sequencing reads it generated,” says Dr Michael Quail. Analysing such a huge amount of data posed a huge challenge. Sanger Institute researchers Dr Richard Durbin and Dr Heng Li developed the software to rapidly align sequencing reads to the human reference and make it easier for the data to be analysed. “It was such a good tool that it is still used today,” adds Michael. 

The Sanger Institute continues to provide technologies that make sequencing accessible and portable. Some of these projects won’t make it into patents but are still critical to the Sanger Institute mission of empowering and enabling others to develop their own genomic science. Dr Physilia Chua and her team at the Tree of Life programme, for example, have produced instructions for high throughput genome sequencing in portable devices in an effort to aid in the fight against malaria. This will enable portable sequencing in countries where infrastructure and capacity are limited but malaria is endemic, such as Ghana or Vietnam. 

Scientist loading the sequencing machine in Sanger's sequencing facilities. Credit: Greg Moss/ Wellcome Sanger Institute

Scientist loading the sequencing machine in Sanger's sequencing facilities. Credit: Greg Moss/ Wellcome Sanger Institute

Sample management in the Tree of Life programme. Credit: David Levene / Wellcome Sanger Institute

Sample management in the Tree of Life programme. Credit: David Levene / Wellcome Sanger Institute

"We work with partners, including investors, to apply our own science to solve real-world problems, but we also maximise our impact by enabling others to innovate by sharing the resources we create.”
Dr Emmanuelle Astoul, Head of Translation, Wellcome Sanger Institute

Credit: Greg Moss / Wellcome Sanger Institute

Credit: Greg Moss / Wellcome Sanger Institute

Advancing healthcare & biodiversity with ambitious projects 

The Sanger Institute’s ability to undertake large-scale initiatives due to its people, infrastructure and open-access culture enables us to manage projects that cannot happen anywhere else. International efforts and collaborations such as the Human Cell Atlas, HipSci and the Cell Model Passports platform are examples of the ambitious science that takes place at the Sanger Institute in collaboration with colleagues all around the world.  

The Human Cell Atlas (HCA) is a global collaborative consortium spearheaded in 2016 by Sarah Teichmann (Sanger Institute) and Aviv Regev (then at the Broad Institute) with the aim of charting the cell types in the body, across time, from development until old age. The consortium launched with nearly 100 people and has now grown to 3,089 members from 97 different countries. This titanic undertaking aims to transform the understanding of the 37.2 trillion cells in the human body. 

Dr Sarah Teichmann, Head of Cellular Genetics at the Wellcome Sanger Institute, and co-founder and co-leader of the HCA, said: “A complete Human Cell Atlas, detailing all stages of life, will help explain many aspects of human health and disease, and the HCA is already providing an amazing resource for researchers. With many translational implications, the highly detailed cellular maps can serve as a healthy reference for understanding what goes wrong in disease, and as a blueprint for creating cells in the laboratory for research or therapeutics. The HCA is likely to lead to major advances in the way illnesses are diagnosed and treated and ultimately lead to a new era of precision medicine.” 

Some key stats hint at its relevance. Tens of millions of individual cells characterised; over 150 HCA publications, and more than 17,000 citations to date. All cell data is openly accessible within ethical limits. The project’s potential to inform the development of diagnostics, drug discovery and novel treatment avenues was highlighted by the consortium on Nature’s HCA impact on medicine

Impact from the HCA includes a better understanding of SARS-CoV-2, kidney cancer and the development of the immune system, among many others. Early in the pandemic, the consortium identified specific cell types in the nose as likely initial infection points for the coronavirus, which helped explain high transmission. On kidney cancer, the HCA helped reveal its developmental source, finding that this type of cancer may arise from non-fully developed cells, thus offering a new target for treatment. The HCA has also mapped the immune cell populations across multiple tissues in development and adulthood to provide new insights into how our immune system works.


Fluorescently labelled placenta and decidua cells. Credit: Kenny Roberts / Wellcome Sanger Institute

Fluorescently labelled placenta and decidua cells. Credit: Kenny Roberts / Wellcome Sanger Institute

The Sanger Institute also played a major role during the COVID-19 pandemic, sequencing 20 per cent of all virus genomes worldwide - handling more than 27 million samples in total. During the peak of the pandemic, there were over ten million SARS-CoV-2 samples at the Institute, with hundreds of thousands arriving each week. Dr Sara Stott, Senior Scientific Manager at the Genomic Surveillance Laboratory Platforms at the Sanger Institute states that her team was sequencing almost 70,000 viral genomes a week. The genomic data were provided in near-real time to the UK Health Security Agency, giving information on the spread of the virus. It also helped other health authorities make informed decisions, such as by improving infection control in dialysis units where vulnerable individuals were at greater risk.


Credit: Laura Olivares Boldu / Wellcome Connecting Science

Credit: Laura Olivares Boldu / Wellcome Connecting Science

Organoids in the lab

Organoids in the lab

Robotic arm handling samples at the Sanger Institute. Credit: Greg Moss / Wellcome Sanger Institute

Robotic arm handling samples at the Sanger Institute. Credit: Greg Moss / Wellcome Sanger Institute

Another example comes from Sanger Institute’s Translational Cancer Unit, where Dr Mathew Garnett’s lab has genetically annotated 2,000 cancer cell lines and generated over 300 tumour-derived organoids in over 15 years of research. These are used to run the lab’s large-scale experiments and have produced four open-access platforms on preclinical cancer models, genetic dependencies and targets, biomarkers and drug combinations. These platforms have been accessed by over 100,000 users in the past 12 months, positioning Garnett’s lab as a leading therapeutic information resources provider. This research is now being leveraged by a Sanger spin-out, Mosaic Therapeutics, with the potential to discover new treatments for cancer patients.

The Sanger Institute is currently capable of sequencing one human genome every 2.7 minutes. This sequencing at scale can only happen in very few other places in the world. “There are three key factors that underpin this capacity: infrastructure, breadth of expertise and harmonisation of protocols,” says Dr Ian Johnston, Associate Director of Sequencing Operations at the Sanger Institute. The team has introduced numerous innovations. They have automated systems such as liquid handling, for example, and have developed a coherent set of protocols which can be deployed in different settings or types of sequencing. 

Expertise in automation and processes has also been pivotal. Ian Johnston says: “Some of the team members, including myself, have prior experience in scientific service delivery industries with very tight turnaround times. We’ve taken that mindset and applied it to the science at Sanger. There is plenty of thinking behind it - process mapping, technology evaluation and workflow design. We’re continuously asking ourselves: how can we deploy this to make it even more reliable, and scalable whilst reducing the risk of error?”

Sharing data is not just putting all the raw data out there, we need to make it useful for the community. That means including rich metadata, which make the data we share findable and reusable.
Dr Mate Palfy, Open Science Lead at the Sanger Institute. 


Sharing the data

“A key factor in ensuring other scientists and organisations can innovate on our foundational work is to make sure the data we share are organised and standardised,” says Dr Emmanuelle Astoul, Head of Translation at the Sanger Institute. 

Sharing data is not just putting all the raw data out there, we need to make it useful for the community. That means including rich metadata, which make the data we share findable and reusable” adds Dr Mate Palfy, Open Science Lead at the Sanger Institute. 

We would not be able to do this without EMBL's European Bioinformatics Institute (EMBL- EBI), which sits right next to us in the Wellcome Genome Campus,” says Dr Julia Wilson, Associate Director at the Sanger Institute. “We’ve been partner organisations for almost 30 years now, and our data flows are well established. EBI has the capacity to take the data at scale that we produce, aggregate and improve it and serve it to the world.

There are many examples of this collaboration, but the HipSci resource might be the most explicit. Producing one line of induced pluripotent stem cells isn’t too difficult - you need a lab with the right equipment, expertise and consent from the donor. However, one cell line will not be able to give researchers the necessary information on how humans of different genetic backgrounds react to a certain drug. In order to obtain robust data that will safely validate if a new treatment works or not, researchers ideally need to be able to run tests on a wide variety of genetic backgrounds, and that’s the tricky bit - it requires a panel of lines to be produced using the same methods so that they are comparable to each other. 

The Sanger Institute and EMBL-EBI, as part of an international collaboration, did just that. The HipSci resource now contains over 700 induced pluripotent stem cell lines, which are considered valuable for a number of reasons. Firstly, the number itself. Secondly, they include two cohorts of donors: a set of individuals with an inherited genetic disease, and another set who were healthy at the time. Both cohorts have open access genomic data. Thirdly, the fact that all cell lines were created following a standardised process. Researchers can use these cells and compare experiments knowing they were developed in the same way. This normally leads to significantly more robust conclusions. 

Now, cell lines derived from the HipSci project are being picked up by different laboratories worldwide to understand the mechanism of diseases and progress therapeutic discovery for a wide range of diseases. The Jackson Laboratory, for example, is a non-profit institution based in Maine, USA, that specialises in genomic solutions for disease. It has recently launched a Human iPS cells portal with the aim of providing a complete resource for scientists to study the genes behind neurodegenerative diseases, such as Alzheimer’s.  


The COSMIC team at the Wellcome Genome Campus. Credit: Marc Folland / Wellcome Sanger Institute

The COSMIC team at the Wellcome Genome Campus. Credit: Marc Folland / Wellcome Sanger Institute

Skyline of the Data Centre. Credit: Phil Mynott/ Wellcome Sanger Institute

Skyline of the Data Centre. Credit: Phil Mynott/ Wellcome Sanger Institute

Another example is COSMIC - the Catalogue of Somatic Mutations in Cancer - which is the world's largest source of expert manually curated somatic mutation information relating to human cancers. COSMIC is an example of how valuable organised and curated data can be. Born as a tool to support academic researchers to keep on top of ever-growing information in the field of cancer mutations, it is now used daily across the globe by academics but also by drug discovery companies, big pharmaceutical companies, clinical researchers, and diagnostics providers. It allows them to gain insight into the causes of cancer, potential therapeutic targets, clinical report annotation or for the identification of critical mutations in cancer patient data. 

The COSMIC repository contains targeted gene-screening panels with over 27,000 peer-reviewed papers and metadata. The genome-wide screen data host over 37,000 genomes, consisting of peer-reviewed large-scale genome screening data which can be used to discover novel driver genes. 

There are different international efforts for data standards. The Sanger Institute is one of four host institutions for the Global Alliance for Genomics and Health (GA4GH) and was a founding member in 2014. Working together, GA4GH and the Sanger Institute create tools and standards for the genomics community, by providing staff that coordinate the coalition and technical and policy expertise, to support data use at an international scale. 

Dr Sarion Bowers, Head of Policy at the Sanger Institute, and Sanger representative for GA4GH says: “Standardisation underpins data sharing. Internally, we are preparing a data management strategy in which we’re looking specifically at metadata, which is key to usability. There are also other areas such as diversity in data - we need to improve the diversity and representativeness of our data, which will make it much more valuable to the community.” 

Into the Future

The Sanger Institute has recently embarked on a gargantuan project, the Darwin Tree of Life, which aims to sequence the genomes of all animals, plants, fungi and protists in Britain and Ireland - an estimated 70,000 species in total. As a collaboration of ten equal partners, the Sanger Institute is working alongside the Natural History Museum in London, the botanic gardens at Kew and Edinburgh, the Earlham Institute, Marine Biological Association, EMBL-EBI, and the Universities of Oxford, Cambridge and Edinburgh. The project is playing a leading part in a global effort, known as the Earth BioGenome Project, to sequence and assemble genomes for every eukaryotic species on the planet.

This had been a distant aspiration since the discovery of the structure of DNA and the invention of DNA sequencing technology - that there would ultimately be a reference genome sequence for all species on Earth. The Darwin Tree of Life is certainly a step in that direction. 

Samples from more than 4,800 species have already been collected and sent to Sanger for whole-genome sequencing. Of those, Sanger’s Tree of Life programme has released over 1,000 genomes to public scientific databases, freely available to researchers worldwide, with a further 2,500 genomes being actively sequenced and assembled at the moment. The study of these data is expected to have a profound effect on agriculture, biodiversity and conservation, new medicines and even in industrial materials and techniques.

Sample prepping in the Tree Of Life laboratories. Credit: David Levene / Wellcome Sanger Institute.

Sample prepping in the Tree Of Life laboratories. Credit: David Levene / Wellcome Sanger Institute.

An insect in liquid. Credit: David Levene / Wellcome Sanger Institute

An insect in liquid. Credit: David Levene / Wellcome Sanger Institute

Sanger's Tree Of Life has released over 1,000 genomes to public scientific databases, freely available to researchers worldwide. Credit: David Levene / Wellcome Sanger Institute

Sanger's Tree Of Life has released over 1,000 genomes to public scientific databases, freely available to researchers worldwide. Credit: David Levene / Wellcome Sanger Institute

Item 1 of 3

Sample prepping in the Tree Of Life laboratories. Credit: David Levene / Wellcome Sanger Institute.

Sample prepping in the Tree Of Life laboratories. Credit: David Levene / Wellcome Sanger Institute.

An insect in liquid. Credit: David Levene / Wellcome Sanger Institute

An insect in liquid. Credit: David Levene / Wellcome Sanger Institute

Sanger's Tree Of Life has released over 1,000 genomes to public scientific databases, freely available to researchers worldwide. Credit: David Levene / Wellcome Sanger Institute

Sanger's Tree Of Life has released over 1,000 genomes to public scientific databases, freely available to researchers worldwide. Credit: David Levene / Wellcome Sanger Institute

The aim of the Darwin Tree of Life project is hugely ambitious, and deliberately so. We want to ensure the next generation of scientists have the top-quality genomic data critical to carrying out the research needed to safeguard the future. As part of a global network of biodiversity genomics projects, known as the Earth BioGenome Project, we are on the cusp of changing biology forever,” says Dr Mark Blaxter, Head of the Tree of Life Programme at the Sanger Institute.

Another, more incipient project, is the Atlas of Variant Effects. Sanger scientists have taken a leading role in this internationally collaborative initiative to generate all possible mutations in protein coding genes and regulatory sequences of the genome and to create an Atlas of Variant Effects. During its early stages approximately 11 million total variants have been studied, which are already informing interpretation of clinical genetic tests, albeit thus far covering less than one percent of the relevant human genome. 

The underpinning philosophy for innovation at Sanger combines the mission of making an impact from our science with nurturing the next generation of scientific leaders. Looking into the future, we're exploring ways of unlocking the innovation and entrepreneurial potential of our scientists, especially those earlier in their careers to enable them to make a difference with their science in ways we've yet to imagine,” says Dr Joanna Mills, Head of Entrepreneurship at the Sanger Institute. 

Nature’s recognition is a validation of Sanger’s mission - to apply and explore genomic technologies at scale to advance understanding of biology and improve health. Dr Emmanuelle Astoul says, “By undertaking grand scientific challenges and creating shared resources at a scale few could aim for, our ambition is to generate insights and technologies we and others can apply to benefit society.”

Related news

Predicting and engineering biology in new research programme

The Wellcome Sanger Institute launches a new research programme that will combine large-scale genomic data generation with machine learning to predict the impacts of mutations and engineer biological systems.

Read the article here

Sanger careers

It's time to make an impact. Join our community of world-class thinkers and professionals. Together we achieve life-changing science.

Read more here

Genomics Innovation at Sanger

We build on the innovative capabilities of our people and deliver a benefit to society from Sanger science by engaging with businesses and by creating commercial opportunities.

Learn more about us