March 21, 2023

Medical Trend

Medical News and Medical Resources

​Nomaly: A new designed system for human gene sequencing

Nomaly: A new designed system for human gene sequencing


Nomaly: A new designed system for human gene sequencing.

​Nat Commununications:| Phenotype prediction directly from human gene sequencing: a new system designed from scratch.


In recent years, the vigorous development of high-throughput gene sequencing technology has led to an explosion of personal gene sequencing data.

However, the current method of gene analysis is still based on correlation analysis, and systematically predicting the impact of genotype on phenotype is still facing important challenges. From genotype to phenotype interpretation, new breakthroughs in methods are urgently needed.


On February 17, 2023, the Julian Gough research group (first author: Lu Chang) of the Medical Research Council Laboratory of Molecular Biology (MRC-LMB) in Cambridge, UK published a research paper on Nature Communications Hypothesis-free phenotype prediction within a genetics-first framework [1] describes a new analysis system Nomaly designed from scratch .

Different from the currently commonly used analysis methods (such as GWAS) of genome-wide scanning of strong correlation loci , Nomaly integrates the research results of molecular biology and bioinformatics and appropriate tools to analyze abnormal phenotypes from individual gene sequencing results ( Observable or detectable characteristics and phenotypes, including disease and tissue-level phenotypes) for direct prediction.

The analysis system includes two main modules: direct prediction of phenotype from human gene sequencing, and verification and evaluation of prediction results.


Nomaly: A new designed system for human gene sequencing



Starting from DNA sequencing results, direct prediction of tissue-level phenotypes is the ultimate goal of interpreting human genetic codes. However, the feasibility and operability of direct prediction is still unknown.

This work took more than ten years, innovatively designed a prediction method from scratch, systematically analyzed the changes in the individual genome, and integrated multi-level gene information flow (DNA mutation -> amino acid mutation -> protein structure and function impact ) , which integrates multiple large-scale phenotype database annotation information (tens of thousands of levels of known relationship knowledge between phenotypes and genes) , and systematically quantifies and scores phenotype abnormalities to obtain organizational-level phenotype predictions.

The Nomaly system was applied to three independent data sets, and it was verified that the phenotype prediction of this ab initio can reach statistical significance, realizing the prediction of tissue or disease-level phenotype and potential gene interpretation. Nomaly’s dual ability of “prediction + explanation” is difficult to achieve in correlation analysis.


The primary data set in the article is composed of volunteers who already have their own gene sequencing results all over the world: Volunteers voluntarily upload the DNA sequencing results purchased by themselves, make predictions based on these DNA data, and generate information about the table based on this. type of questionnaire for them to answer.

In order to establish an automatic quality control and processing pipeline for uploaded data from different sources and different DNA sequencing methods, the researchers analyzed nearly 7,000 fully open personal gene databases (OpenSNP), and made the preprocessing methods and results obtained from the research public . 【2】 .

In addition, the researchers also tested the significance of the prediction success rate on the UK’s largest dataset of children with developmental diseases (Deciphering Developmental Disease, DDD) [3] .

Finally, people who participated in the Human Induced Pluripotent Stem Cell Initiative (HipSci) (Human Induced Pluripotent Stem Cell Initiative, HipSci) [4] used this method to predict the phenotype at the cell level, and through cell experiments, one of the predicted results related to mitotic abnormalities was compared. 


The prediction method in this system is based on the basic structural and functional units of proteins – protein domains: the SUPERFAMILY database (SUPERFAMILY database) that has been developed for decades [5] is used , and the hidden features of protein domains are used. Markov model is used to score amino acid mutations caused by changes in gene sites, and the sites in the human genome are mapped to tens of thousands of phenotypes through the protein domain semantic annotation (function + phenotype) database (dcGO) [6] .

The prediction system scores the degree of difference in the functional changes caused by gene changes, and based on the assumption that the degree of difference is large, it may be the corresponding abnormal phenotype, and predicts the abnormal phenotype (outlier prediction) .


Nomaly: A new designed system for human gene sequencing
Legend – Introduction to the prediction method (Nomaly): Taking personal DNA sequencing results as input, comparing it with a database composed of thousands of people, predicting abnormal phenotypes caused by DNA changes.

This system of “ab initio prediction + verification of prediction significance” to obtain biologically meaningful pathogenic gene interpretation is an important progress in the field of gene analysis.

Today, with the explosive development of DNA sequencing methods and millions of people having whole-genome sequencing data, the innovation of analysis methods for sequencing results is imminent, in order to more effectively discover information and obtain more valuable information from these ever-growing data resources.

Discovery of medical value. Further development of this system, as well as integration with other approaches, will also advance research in personalized precision medicine.







Original link:

Nomaly: A new designed system for human gene sequencing

(source:internet, reference only)

Disclaimer of