May 19, 2024

Medical Trend

Medical News and Medical Resources

Big Data Promoting the Development of Innovative Drugs

Big Data Promoting the Development of Innovative Drugs


Big Data Promoting the Development of Innovative Drugs. The development of innovative drugs from the laboratory to the market requires a lot of time and financial investment. Especially in the drug development process, it is necessary to systematically study the key biological activities of candidate drugs, such as efficacy, pharmacokinetics (Pharmacokinetics, PK) and Adverse reactions etc.

With the development of chemical synthesis and biological screening technology in the past decade, the biological data of millions of small molecules has been generated in the field of drug research and development, which has been summarized in various databases. If we can find a reasonable combination of these accumulated data and new machine learning (ML) methods such as deep learning, it will bring a huge impetus to drug research, help in-depth understanding of compound structure, and predict in vitro, In vivo and clinical effects, thereby promoting drug discovery and development in the era of big data.

Big Data Promoting the Development of Innovative Drugs 

Candidate drugs require a series of tests on their efficacy, PK properties and potential side effects in the early development stage. In recent decades, with innovations in combinatorial chemistry, robotics, and high-throughput screening (HTS), the efficiency of screening massive compounds for specific drug targets has been greatly improved, resulting in a large number of lead compounds and drug candidates The data of substances has brought modern drug discovery into the era of big data. According to the data generated in drug discovery, it can be summarized as the “Ten Vs scheme of’BIG DATA’”: including data volume, data update rate, data diversity, data authenticity, and data validity , Data terminology, data generation platform, data visualization, data variability, and data value.

In the early stages of drug discovery and development, the application of machine learning methods has proven to be valuable. For example, models based on Quantitative Structure–Activity Relationship (QSAR) methods have been widely used to quickly predict the various properties of a large number of new compounds, such as logP, solubility, biological activity, ligand binding activity, drugs Efficacy and adverse reactions. Most of these QSAR models are developed using molecular descriptors describing chemical structures and classic ML algorithms, such as Random Forest (RF), Support Vector Machines (SVM) and K-nearest neighbors, etc. . With the increase in data size and computing power, a new generation of artificial intelligence (such as deep learning algorithms) has also begun to be applied to modeling drug biological activity. For example, Eli Lilly uses deep learning to model the company’s 24 historical data sets containing more than 1 million compounds, prioritize candidate drugs with therapeutic effects, and exclude compounds with potential adverse reactions. By removing inappropriate compounds even before chemical synthesis, the cost of drug development is greatly reduced.

Big data on drug development

Compared with IT applications such as social network analysis, the data set used for drug discovery research is relatively small. Currently publicly available data related to drug discovery and development can be divided into six categories according to its application and relevance at different stages of drug discovery and development:

  • (1) Comprehensive compound databases (for example, Enamine REAL database, PubChem and ChEMBL) 
  • (2) Chemical databases specifically designed for drugs/drug-like compounds (for example, DrugBank, AICD and e-Drug3D)
  • (3) Collection of drug targets, including genomics and proteomics data (for example, ASD, BindingDB, Supertarget And Ligand Expo);
  • (4) Store biological data databases (such as HMDB, TTD, WOMBAT and PKPB_DB) obtained through screening, metabolism and efficacy studies;
  • (5) Drug safety and toxicity databases (such as DrugMatrix, SIDER and LTKB benchmarks) Data sets);
  • (6) Clinical databases (such as, PharmaGKB and EORTC clinical trial databases).

Although the number and scale of these databases have greatly expanded in recent years, a large part of the data is not about drug discovery and development.

 Big Data Promoting the Development of Innovative Drugs


Big Data Promoting the Development of Innovative Drugs



Characteristics of drug big data

As mentioned earlier, drug research driven by big data still faces long-term challenges. The large amount of data accumulated in the long-term development process is obtained from different sources, and the data presents the diversity of biological conditions. Although such data can bring certain information, special attention should be paid to the following issues:

The first thing to bear is data quality. With the development of new testing technologies, the growth of drug discovery data has exceeded our ability to use them. However, lack of quality control has always been a common problem in public databases. As we all know, the research of algorithm modeling will follow the basic principle of “garbage in, garbage out”, so it emphasizes the importance of quality control, especially the authenticity and authority of data. For example, in many reports, the conditions for detecting the same compound are different, resulting in a large number of different data on the same properties of the compound, and these data are even contradictory. Therefore, it is very necessary to take the lead in extracting meaningful data from big data for sorting.

Second, the lack of key feature data. When using big data and ML to build models to boost drug development, it is often faced with insufficient or even lack of multiple types of data. How to take some steps to deal with such a problem more reasonably is a big dilemma that must be solved under the current situation of mixed and incomplete data. For example, a QSAR model can be established for a single target to extrapolate to supplement the data; the “Read-across” method can also be used, that is, the data corresponding to molecules similar to the target molecule among the detected molecules can be selected to supplement the data. However, no matter which method is used, prediction errors will be introduced, especially considering the different sources of data, different standardized processes, quality control, and expert annotations will make this error problem more prominent.

In summary, artificial intelligence methods such as big data in drug development and deep learning have demonstrated their advantages in innovation and acceleration in multiple stages of drug development. With the development of data quality and drug knowledge-based artificial intelligence methods, we still There are optimistic expectations for the artificial intelligence method to open up a new track for drug development and change the existing traditional drug development model.


(sourceinternet, reference only)

Disclaimer of