“Any sufficiently advanced technology is indistinguishable from magic“ – Arthur Clarke (1917-2008)
AI and genomics represent two of the most transformative technologies of the 21st century. How is their fortuitous convergence revolutionizing our insights into health and disease, and how are they expected to reshape biology, medicine, and society as a whole?
Inside This Article:
- What is the genomic revolution?
- How have AI/ML emerged in the last few decades, and why are they so well poised to tackle genomics?
- What are some AI applications to key genomic challenges?
- How does the future of AI in genomics look?
A 21st Century Genomic Revolution
In the last half-millennium, Darwin published his 1859 On the Origin of Species, Mendel developed his fundamental laws of inheritance in 1865, Watson and Crick detailed the structure of DNA in 1953, and Sanger and massively parallel sequencing of DNA were invented in 1977 and 2000, respectively – culminating in the sequencing of the first human genome in 2004 (1). Since, the price of whole genome sequencing has plummeted from nearly $3 billion to $300 in the last two decades, resulting in the generation of troves of genomic data, predicted to reach an extra 2 to 40 exabytes by 2032. How can we curate, organize, annotate, and extract salient information from this data?
AI’s Rise To Stardom – And Its Timely Tango with Genomics
Artificial intelligence (AI) was born in the wake of cybernetics and the pioneering computational work of John Von Neumann and Alan Turing of the 1950s (2), but not defined as a concept until 1956, as “the construction of computer programs that engage in tasks that […] require high-level mental processes, such as perceptual learning, memory organization and critical reasoning”. Following its application to Expert Systems in the 1970s, AI reached new heights in 1996 after IBM’s AI-powered Deep Blue’s astonishing defeat of reigning chess champion Garry Kasparov, and, by 2010, the generation of massive volumes of data alongside the rise of efficient graphics processing only propelled the field forward at an even faster pace – which continues to accelerate to this day. Within AI, machine learning (ML) methods specifically, whereby systems learn from data to predict outcomes, have seen particular progress – including deep learning models, cleverly based on neural network architectures which mimic the brain’s own inherent structure and function. Today, such ML-based in silico analyses are particularly well poised to organize, label, extract out, and interpret the valuable and medically critical information hidden within our ever-growing vaults of genomic data of monumental biological complexity. While 2010 saw just several hundred peer-reviewed scientific publications on AI in genomics, 2020 and 2021 have consistently seen over 5,000.
AI In Genomics: Five Fantastic Applications
Well-trained ML models have helped make sense of gargantuan volumes of genomic data and parse out the culprits of very rare and otherwise undiagnosed genetic disorders – not unlike searching for needles in a haystack – while predicting the functional impact of mutations. Rare diseases affect 500 million people globally, and each takes on average 5 years to reach a correct diagnosis. While up to 95% still lack an FDA-approved treatment, some of the disorders involve known molecular pathways for which FDA-approved treatments may be life-saving (3). The return on investment may be so high that we are in increasingly close sight to the “$0 genome”.Relatedly, advanced AI algorithms can be leveraged to diagnose cancers using cell-free DNA fragments, classify metastatic versus primary prostate cancers from patient molecular data, predict how a certain type of cancer will evolve based on a tumor’s spatiotemporal gene expression patterns, profile cardiovascular diseases, and classify inflammatory bowel disease – facilitating not only diagnoses but improving the development and implementation of efficient, patient-centered treatments.
Integrated genetic-phenotypic diagnostics
In parallel, next-generation phenotyping (NGP) technologies have leveraged deep learning algorithms to collect, structure, and analyze physiological data, including faces, to develop precise genetic diagnoses and shed light on actionable clinical insights. AI-facilitated genotype- and phenotype-informed identification as such provides a comprehensive snapshot of unique clinical cases (4) – with syndromic genetic conditions affecting 8% of the population, the benefits are invaluable.
Drug discovery and development
As the co-founder and CEO of ML diagnostics start-up Verge Genomics explains, “One of the reasons the cost of drug development has grown so much despite massive efficiency improvements is the reliance on reductionist models, especially for very complex diseases”. Unrivaled in its power to match up to the complexity of the pathomechanism of disease, AI, however, can be leveraged to discover drugs specifically tailored to individuals’ genetic backgrounds – buoying the development of next-generation individualized therapies and ushering in the new era of precision medicine (5,6).
Meanwhile, an estimated 100 petabytes of data have been generated related specifically to RNA – the biological workflow of which is particularly well-understood (including RNA splicing and polyadenylation, and microRNA targeting). Capitalizing on this opportunity, Deep Genomics has implemented over 40 different ML predictors to design new RNA targets and therapies, predict their safety, and identify compounds that might alter the levels or function of key proteins in human health and disease. “Winning the race to develop RNA therapies requires mastering the enormous complexity of RNA biology, and the only way to do that is with AI. Computers are much better than humans at quantitatively prioritizing targets,” elucidates Deep Genomics founder and CEO.Genome engineering
Leveraged to improve CRISPR functionalities, AI can decrease CRISPR’s still pesky off-target effects. While traditional methods have used rather basic heuristics to predict off-targets effects of potential single guide RNAs (sgRNAs), a novel CRISPR Target Assessment (CRISTA) algorithm has harnessed an ML framework to gauge the likelihood that a given sgRNA will cleave any specific genomic site – while, powerfully, providing inferences related to the patterns affecting CRISPR’s mechanism of action (7). Parallel advances have been made to minimize off-target effects through sgRNA sequence optimization or even the development of specific Cas9 variants (8–10).
Relatedly, in the context of de-extinction, AI is offering a lever to reconstruct degraded DNA from highly fragmented ancient remains.
INFECTIOUS DISEASE PROFILING, EPIDEMIOLOGY, AND THERAPEUTICS
AI applications to infectious disease have further shed light on pathogen genome sequences, epidemiological dynamics, drug development strategies, and vaccine discovery. Of tangible benefits in the context of the ongoing COVID-19 pandemic, thousands of manuscripts have been published combining ML and sequencing data, such as specifically to uncover SARS-CoV-2’s evolutionary origins, design antibodies, or probe host cellular responses.
In parallel, AI methods for plant genomics and phenomics have galvanized the next-generation plant breeding revolution – from helping to pinpoint the genetic bases of key traits to identifying microRNAs associated with stress-related conditions.
Finally, the computational capacity of deep learning models have been critical to advancing a panoply of conservation efforts. First, ML is helping predict how poorly adapted a given genetic strain of species would be to a certain set of environmental conditions (anthropogenic or not) by combining known genetic and environmental information. Second, ML advancing species engineering and de-extinction efforts in particular – harnessing its computational prowess to fill gaps in the genomes being reconstructed from often highly degraded and fragmented remains of ancient DNA.
Future Directions: Promises and Pitfalls
AI-enabled solutions: nonlinear symbiosis
Many creative, advanced AI models continue to be created, but data, “the new code”, is the lifeblood of AI – and the acquisition of high quality, well-organized, and diverse data will be a critical differentiator of the most efficient models. A biologically informed, balanced annotation of data will also be critical – with few fixed paradigms, OpenAI, for example, allows for a high degree of adaptability and therefore applicability of its AI models. Like predicting a mutation’s impact on protein folding and carrying out highly targeted CRISPR genetic editing to correct the mutation as required, powerful breakthroughs have and will continue to be reached as AI solves multiple key problems that can mutually feed off each other.
A window of opportunity
The implementation of AI technologies for genomics naturally asks profound questions about human agency, data privacy and bias, equity, and power structures. Streamlined AI-based strategies certainly have the remarkable potential to facilitate clinically tractable solutions to resource-poor settings, which still suffer from sparse access to diagnostic tools and other basic health care services. To fulfill this promise however, sizeable data sets will be crucial to increasing sensitivity while minimizing bias, and ensuring explainability will build trust and confidence across all stakeholders (11–13). Ethically grounded and well-explained, novel AI methods for genomics – symbiotic and scalable – are in excellent position to continue to unlock node problems, now and into the future, revolutionizing biology, medicine, and society.
“Every task involves constraint, Solve the things without complaint; There are magic links and chains, Forged to loose our rigid brains. Strictures, structures, though they bind, Strangely liberate the mind.” – James Falen
- Durmaz AA, Karaca E, Demkow U, Toruner G, Schoumans J, Cogulu O. Evolution of genetic techniques: Past, present, and beyond. BioMed Research International. 2015.
- Turing AM. Computing machinery and intelligence-AM Turing. Mind. 1950;
- JC J, W W, B S, J T, DR L, R H, et al. Compound Heterozygous Inheritance of Mutations in Coenzyme Q8A Results in Autosomal Recessive Cerebellar Ataxia and Coenzyme Q 10 Deficiency in a Female Sib-Pair. JIMD Rep. 2018;42:31–6.
- Frey LJ. Artificial intelligence and integrated genotype–Phenotype identification. Genes (Basel). 2019;
- Dias R, Torkamani A. Artificial intelligence in clinical and genomic diagnostics. Genome Med 2019 111. 2019 Nov 19;11(1):1–12.
- Uversky VN, Hassan M, Awan FM, Naz A, Deandrés-Galiana EJ, Alvarez O, et al. Innovations in Genomics and Big Data Analytics for Personalized Medicine and Health Care: A Review. Int J Mol Sci 2022, Vol 23, Page 4645. 2022 Apr 22;23(9):4645.
- Abadi S, Yan WX, Amar D, Mayrose I. A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action. PLoS Comput Biol. 2017;
- Wang D, Zhang C, Wang B, Li B, Wang Q, Liu D, et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat Commun. 2019;
- Xiang X, Corsi GI, Anthon C, Qu K, Pan X, Liang X, et al. Enhancing CRISPR-Cas9 gRNA efficiency prediction by data integration and deep learning. Nat Commun. 2021;
- Chuai G, Ma H, Yan J, Chen M, Hong N, Xue D, et al. DeepCRISPR: Optimized CRISPR guide RNA design by deep learning. Genome Biol. 2018;
- Plagwitz L, Brenner A, Fujarski M, Varghese J. Supporting AI-Explainability by Analyzing Feature Subsets in a Machine Learning Model. Stud Health Technol Inform. 2022 May 25;294.
- Dey S, Chakraborty P, Kwon BC, Dhurandhar A, Ghalwash M, Suarez Saiz FJ, et al. Human-centered explainability for life sciences, healthcare, and medical informatics. Patterns (New York, NY). 2022 May;3(5):100493.
- Kiseleva A, Kotzinos D, De Hert P. Transparency of AI in Healthcare as a Multilayered System of Accountabilities: Between Legal Requirements and Technical Limitations. Front Artif Intell. 2022 May 30;5.