- New DNA Repair Approach Successfully Repairs Pathogenic Gene Mutations in Patients’ Kidney Cells
- Why does moderate starvation during sickness can enhance the activity of immune cells?
- WHO experts agree on new name for monkeypox virus variant
- How terrible is the newly discovered “Langya virus” in China?
- ‘Most Expensive Drug’ Zolgensma facing new challenge after Two Children Died
- Hair loss and sexual dysfunction added to list of symptoms of long-COVID along with fatigue and brain fog
Overview of Next Generation Sequencing Technology
- A highly infectious disease that has been extinct for more than 40 years has appeared in New York
- How long can the patient live after heart stent surgery?
- First time: Systemic multi-organ recovery after death
- Omicron new variant BA.2.75 has stronger infectivity than BA.4 and BA.5?
- Taiwan death from COVID-19 vaccination exceeds death from COVID-19
- The world top 5 best-selling drugs in 2020
Overview of Next Generation Sequencing Technology.
In the past few years, researchers have been developing methods and techniques to help determine the sequence of nucleic acids in biological samples.
Our ability to accurately sequence DNA and RNA has had a huge impact in many research fields. This article discusses what is Next Generation Sequencing (NGS) technology and its applications.
What is next-generation sequencing?
The structure of DNA was determined by Watson and Crick in 1953 based on Rosalind Franklin’s basic DNA crystallography and X-ray diffraction work.
However, the first molecule sequenced was actually RNA aminoacyl-tRNA. In 1965, various research groups after Robert Holley and the bacteriophage MS2 RNA began to adapt these methods to sequence DNA. One of the breakthroughs was in 1977.
The chain termination method invented by Frederick Sanger and his colleagues. By 1986, the first automated DNA sequencing method was developed. This is the beginning of the golden age of developing and perfecting sequencing platforms, including key capillary DNA sequencers.
The chain termination method, also known as Sanger sequencing, uses the target DNA sequence as a template for PCR, which adds modified nucleotides called dideoxyribonucleotides (ddNTPs) to the DNA strand during the extension step. When DNA polymerase incorporates ddNTP, extension stops, resulting in the production of many copies of the DNA sequence spanning the entire length of the amplified fragment.
Then, gel electrophoresis was used in the early method or capillary in the later automatic capillary sequencer to size-separate these strand end oligonucleotides and determine the DNA sequence.
With these huge technological advances, the human genome project was completed in 2003.
In 2005, the first commercially available NGS platform was introduced, and it has now developed into a second-generation (2G) platform. Compared with Sanger sequencing, this platform can amplify millions of specific DNA fragments in a massively parallel manner. Copies.
The key principles behind Sanger sequencing and 2G NGS have some similarities. In 2G NGS, genetic material (DNA or RNA) is segmented, and oligonucleotides of known sequence are ligated through a step called adaptor ligation, so that the fragments interact with the selected sequencing system.
Then, the base of each fragment is identified by its emitted signal. The main difference between Sanger sequencing and 2G NGS is the amount of sequencing. NGS can process millions of reactions in parallel, thus achieving high throughput, higher sensitivity, speed and lower cost.
Now, NGS can be used to complete a large number of genome sequencing projects using the Sanger sequencing method within a few hours.
There are two main methods in NGS technology, namely short-read and long-read sequencing. Each method has its own advantages and limitations (Table 1 below). The main scope of NGS development investment is its wide applicability in clinical and research. In the clinical setting, NGS is used to diagnose various diseases by identifying germline or somatic mutations.
The transformation of NGS in clinical practice is justified by the power of technology matched with the continuous decline in cost. NGS is also a valuable tool in metagenomics research and can be used for the diagnosis, monitoring and management of infectious diseases.
In 2020, the NGS method will play a key role in characterizing the SARS-CoV-2 genome and will continue to play a role in monitoring the COVID-19 pandemic.
Figure 1: The evolution of sequencing methods.
Next-generation sequencing methods
The term NGS is generally considered to be 2G technology, but since then, third-generation (3G) and fourth-generation (4G) technologies have been developed, and they apply to different basic principles.
Sequencing platform/sequencing technology
The second-generation sequencing method is recognized and has many common features. However, it can be subdivided according to its potential detection chemistry, including sequencing by ligation (incorporating nanospheres) and sequencing by synthesis (SBS), which are further divided into proton detection, pyrosequencing and reversible terminator (Figure 2 below).
Figure 2: 2G sequencing platform and chemical principle diagram.
Proton detection sequencing relies on counting the hydrogen ions released during DNA polymerization. Unlike other technologies, it does not use fluorescence, nor does it use modified nucleotides or optical elements. Instead, the change in pH is detected by the semiconductor sensor chip and converted into digital information.
Pyrosequencing uses the detection of pyrophosphate production and light release to determine whether a specific base has been incorporated into the DNA strand.
So far, the most popular SBS method is reversible terminator sequencing using “bridge amplification”. During the synthesis reaction, the fragments bind to the oligonucleotides on the flow cell, form a bridge from one side of the sequence (P5 oligonucleotide on the flow cell) to the other side (P7), and then amplify it. Use direct imaging to detect the added fluorescently labeled nucleotides.
Unlike SBS, sequencing by ligation does not use DNA polymerase to generate the second strand. In contrast, the sensitivity of DNA ligase to base-pairing mismatches is used to generate fluorescence to determine the target sequence. The digital images taken after each reaction are then used for analysis.
DNA nanosphere sequencing is a form of sequencing by ligation, which utilizes rolling circle replication. The tandem DNA copies are compressed into DNA nanospheres and combined with sequencing slides in a dense dot grid, which can be used for connection-based sequencing reactions.
Although the nanosphere technology reduces operating costs, the resulting short sequence can be a problematic signal for the conversion to be read.
2G NGS technology generally has several advantages over alternative sequencing technologies, including the ability to generate sequencing reads in a fast, sensitive, and cost-effective manner. However, there are some shortcomings, including poor interpretation of homopolymers and polymerase incorporation of wrong dNTPs, resulting in sequencing errors.
Shorter read lengths also require deeper sequencing coverage to achieve accurate contigs and final genome assembly. The main disadvantage of all 2G NGS technologies is the need for PCR amplification before sequencing.
This is related to PCR bias during library preparation (sequence GC content, fragment length, and false diversity) and analysis (base errors/favoring some sequences over others).
The introduction of 3G sequencing avoids the need for PCR, and single molecules can be sequenced without first performing amplification steps. The earliest single molecule sequencing (SMS) technology was developed by Stephen Quake and colleagues.
Here, DNA polymerase is used to obtain sequence information by monitoring the binding of fluorescently labeled nucleotides to DNA strands at a single base resolution.
Depending on the method and instrument used, some of the advantages of 3G NGS include:
1. Real-time monitoring of nucleotide incorporation
2. Unbiased sorting and
3. Longer reading length
However, high cost, high error rate, large amount of sequencing data and low read depth may become problems.
In the 4G system, 3G single-molecule sequencing is combined with nanopore technology.
Similar to 3G, nanopore technology does not require amplification, but uses the concept of single-molecule sequencing, but integrates nano-diameter tiny biological pores (nanopores) through which individual molecules can be identified.
The 4G system currently provides the fastest whole-genome sequence scan, but it is still expensive, error-prone compared to 2G technology, and relatively new.
Therefore, there is currently a lack of extensive data available for this technology.
2G sequencing method and the main steps of next-generation sequencing library preparation
No matter which 2G NGS method is chosen, several main steps must be tailored to the selected target (RNA or DNA) and sequencing system.
(1) Sample preparation (pretreatment)
Extract nucleic acid (DNA or RNA) from selected samples (blood, sputum, bone marrow, etc.).
Use standard methods (spectrophotometry, fluorescence or gel electrophoresis) to perform quality control (QC) checks on the extracted samples.
If RNA is used, it must be reverse transcribed into cDNA, but some library preparation kits may include this step.
(2) Optimize and improve the NGS library
Random fragmentation of cDNA or DNA is usually carried out by enzymatic treatment or ultrasonic treatment.
The optimal segment length depends on the platform used. When optimizing this process, it may be necessary to run a small number of fragmented samples on the electrophoresis gel.
These fragments are then end-repaired and joined with smaller universal DNA fragments called adaptors.
The adaptor has a limited length of the known oligomer sequence to be compatible with the applied sequencing platform and can be identified when performing multiple sequencing.
Multiple sequencing, using each sample’s own adaptor sequence, can simultaneously merge a large number of libraries and sequence them in one run. The pool of DNA fragments with adaptors is called a sequencing library.
Then, the size can be selected by gel electrophoresis or using magnetic beads to remove any fragments that are too short or too long, and these fragments are not ideal for the optimal performance of the selected sequencing platform and scheme.
Then PCR is used to achieve library enrichment/amplification. In techniques involving emulsion PCR, each fragment is combined with a single emulsion bead, which will continue to form the basis of sequencing clusters.
After amplification, a “clean-up” step (for example, using magnetic beads) is usually performed to remove unwanted fragments and improve sequencing efficiency.
The final library can be QC checked using qPCR to confirm the quality and quantity of DNA. This will also allow samples of the correct concentration to be prepared for sequencing.
Depending on the selected platform and chemistry, the clonal amplification of library fragments can be performed before loading the sequencer (PCR) or on the sequencer itself (bridge PCR).
Then the sequence is detected and reported according to the selected platform.
(4) Data analysis
Analyze the generated data files according to the workflow used. The analysis method is highly dependent on the research purpose.
Although they can reduce the number of samples that can be analyzed in a given run, paired-end pairing and paired-pair sequencing provide advantages in downstream data analysis (especially de novo assembly).
The technology connects sequencing reads together. These reads are read from the two ends of the fragment (paired ends) or separated by intervening DNA regions (paired pairs).
There are obviously many options when choosing a sequencing strategy. When determining a suitable library preparation and sequencing platform, the following are some key considerations:
(A) Asking research questions
(B) Sample type
(C) Short-read or long-read sequencing
(D) DNA or RNA sequencing-do you need to view the genome or transcriptome?
(E) Do you need the entire genome or only specific regions?
(F) Depth of reading required (coverage)-specific to the experiment
(G) Extraction method
(H) Sample concentration
(I) Single-ended, paired-end or paired paired reading
(J) Specific reading length required
(K) Can samples be multiplexed?
(L) Bioinformatics tools-depending on the experiment. According to samples and biological issues, the entire sequence analysis process can be adjusted.
Whole Exome and Whole Genome Sequencing
Whole genome sequencing (WGS) is the most widely used form of NGS, which refers to the analysis of the entire nucleotide sequence of the genome.
Whole exome sequencing (WES) is a form of targeted sequencing that only targets protein-coding exons. In humans, this accounts for only about 2% of the genome, so it provides an opportunity for more in-depth research in these regions.
Due to the reduced sequencing burden, WES can also provide a more cost-effective option than WGS, and reduce the amount and complexity of sequencing data obtained.
However, sequencing only a part of the genome may lose important information, thereby reducing the chance of discovering new discoveries.
Although the cost continues to increase, despite the rapid decline in cost, and accompanied by related data analysis challenges, WGS still provides more powerful analysis functions that can reveal a more complete map.
Next-generation sequencing data analysis
Any kind of NGS technology will produce a large amount of output data. The basis of sequence analysis follows a centralized workflow, which includes the original reading QC steps, preprocessing and mapping, and then post-comparison processing, variant annotation, variant calling and visualization.
The raw sequencing data must be evaluated to determine its quality and pave the way for all downstream analysis. It can provide general information about the number and length of readings, any contamination sequence or low-coverage readings. FastQC is one of the most complete applications for calculating quality control statistics for sequencing reads. However, for further preprocessing (such as read filtering and trimming), other tools are needed.
Tailoring the bases at the end of the read and removing the remaining adaptor sequence can often improve data quality. Recently, ultra-fast tools have been introduced, such as fastp, which can perform quality control, read filtered data and make corrections based on sequencing data, combining most of the functions in traditional applications, and running faster than a single application. 5 times.
After checking the quality of the reads and performing preprocessing, the next step will depend on the presence of the reference genome. In the case of de novo genome assembly, the contigs are aligned with the sequences generated using their overlapping regions.
This is usually done with the help of processing pipelines, which can include scaffolding steps to help sort, orient and remove repetitive areas of contigs, thereby improving assembly continuity. 40, 41 If the generated sequence map (aligned with the reference genome or transcriptome), the variation compared to the reference sequence can be identified.
Nowadays, a large number of mapping tools (more than 60 types) have been adapted to process more and more data generated by NGS, take advantage of technological advances and respond to the development of protocols.
42 Due to the increase in the number of mappers, one difficulty is to find the most suitable one.
Information is usually scattered in publications, source code (if any), manuals and other documents. Some tools will also provide the necessary mapping quality checks, as certain deviations will only be shown after the mapping step.
Similar to the quality control before mapping, the correct processing of mapped reads is a crucial step. During this process, duplicate mapped reads (including but not limited to PCR artifacts) will be deleted. This is a standardized method, and most tools have common functions.
Once the read results are mapped and processed, they need to be analyzed in an experiment-specific way, which is called mutation analysis.
This step can identify single nucleotide polymorphisms (SNPs), RNA sequences, etc. Although there are many tools for genome assembly, comparison, and analysis, new and improved versions are constantly needed to ensure that the sensitivity, accuracy, and resolution can match the rapidly evolving NGS technology.
The last step is visualization, which can pose a major challenge to data complexity. Depending on the experiment and the research question presented, a variety of tools can be used. If a reference genome is available, then Integrated Genome Viewer (IGV) and Genome Browser are both a popular choice.
If the experiment includes WGS or WES, Variant Explorer is a particularly good tool because it can be used to screen thousands of variants and allow users to focus on their most important findings.
Visualization tools like VISTA allow comparisons between different genome sequences.
The procedures for de novo genome assembly are more limited. However, tools such as bandages and Icarus have been used to explore and analyze assembled genomes.
Next-generation sequencing bottlenecks
NGS enables us to discover and study the genome in an unprecedented way. However, the complexity of NGS sample processing exposes bottlenecks in the management, analysis, and storage of data sets. One of the main challenges is the computational resources required to assemble, annotate, and analyze sequencing data.
The large amount of data generated by NGS analysis is another key challenge. Data centers are reaching high storage capacity levels and have been struggling to cope with increasing demand, risking permanent data loss. More strategies are constantly being proposed to improve efficiency, reduce sequencing errors, maximize reproducibility and ensure correct data management.
Next-generation sequencing applications
Since the early 2000s, NGS has become an inestimable tool in research and clinical/diagnostic environments.
The methods used include WGS, WES, targeted sequencing, transcriptome, epigenome, and metagenome sequencing have greatly increased.
Figure 3 below summarizes the workflow and options for different data sets.
Figure 3: A flowchart showing possible sequencing strategies for different sample types.
Through WGS, researchers can not only study genes and their involvement in human and animal diseases, but also study the characteristics of microbes and agricultural populations, thereby providing important epidemiological and evolutionary data.
So far, a large number of studies have used WGS to identify mutations, rearrangements and fusion events. Currently, WGS is used to monitor antimicrobial resistance, which is one of the major global health challenges.
As costs continue to decrease, WGS is used more frequently to resequence clinical samples throughout the human genome, and can quickly become routine clinical practice.
Ultimately, WGS will be required to assign functions to most of the remaining genomes and decipher its role in disease.
They are more targeted, making WES and targeted sequencing attractive options for population and clinical research. Despite more restrictions as the name suggests, WES is an important clinical tool in the field of personalized medicine.
Compared with WGS, this method can achieve genetic diagnosis of certain diseases (such as cancer) and genetic characterization of other diseases in a more cost-effective manner.
In addition to many applications of NGS in DNA sequencing, it can also be used for RNA analysis. For example, this can determine the genomes of RNA viruses such as SARS and influenza.
Importantly, RNA-seq is usually used in quantitative research, not only to help identify genes transcribed in the DNA genome, but also to transcribe their level (transcription level) based on the relative abundance of RNA transcripts.
Potential rearrangements of DNA sequences can also be identified by identifying new transcripts.
Epigenome sequencing can study the changes caused by histone modification and DNA methylation.
There are many methods to study epigenetic mechanisms, including whole genome bisulfate sequencing (WGBS), chromatin immunoprecipitation (ChIP-seq) and methylation-dependent immunoprecipitation (MeDIP-seq), and then sequencing. According to the selected method, complete DNA methylation and histone modification profiles can be mapped and studied to gain insights into the regulatory mechanism of the genome.
Metagenome sequencing can provide information for samples collected in a specific environment. It can compare the differences and interactions between mixed microbial populations, as well as host responses. Some potential applications of metagenomic sequencing include, but are not limited to, infectious disease diagnosis and infection monitoring, antimicrobial resistance monitoring, microbiome research, and pathogen discovery.
Key terms and abbreviations for next-generation sequencing:
Stomach pain after having sex may be a ruptured corpus luteum cyst!
(source:internet, reference only)