Gene sequencing is the only technical means to obtain nucleic acid sequence information directly, which is an important branch of molecular diagnostic technology. Although molecular hybridization and quantitative PCR technologies have been developed significantly in recent years, their identification of nucleic acids only stays on the assumption of indirect inference, so for molecular diagnosis based on the detection of specific gene sequences, nucleic acid sequencing is still a technically important reference standard. Currently, there are mainly first generation sequencing technology, next generation sequencing technology and third generation sequencing technology. Next generation sequencing technology still occupies a major position in the global sequencing market, and third-generation sequencing is also in the stage of continuous updating and optimization.
Figure 1. Timeline of introduction of DNA sequencing technologies and platforms. (A, K, Gupta.; et al, 2020)
Next generation sequencing (NGS), also known as massively parallel sequencing (MPS) and high-throughput sequencing (HTS), is a DNA sequencing technology based on PCR and gene chips. It is a DNA sequencing technology based on PCR and gene chip, which can rapidly sequence hundreds of thousands of DNA molecules in hundreds or thousands of samples at a time with low cost and over 99% accuracy. Sequencing hundreds of thousands to millions of DNA molecules in parallel at a time provides an in-depth, detailed, and comprehensive analysis of a species' genome and transcriptome. Compared to first-generation sequencing technologies, NGS has thousands of times higher throughput, but shorter sequence read lengths, up to 250-300bp for the Illumina platform.
Figure 2. Basic scheme of a next generation sequencing experiment. (A, K, Gupta.; et al, 2020)
1. High throughput - Next-generation sequencing can sequence tens or millions of DNA molecules in parallel at a time.
2. Short read length - As the read length of the sequencing process increases, the synergy of gene cluster replication decreases, which will lead to a decrease in sequencing quality, and the read length of next-generation sequencing does not exceed 500bp.
NGS is a revolutionary tool for rapid sequencing of large amounts of DNA. It works by breaking down the DNA into millions of small fragments, then increasing the number of these shorter DNA strands using PCR, then analyzing the specific base arrangement of all the DNA strands obtained in the front in their own unique way, and finally using bioinformatic analysis tools to summarize and splice together the information obtained to derive the status of the target gene to be detected, such as mutation or not, and at which locus the mutation occurs. Different NGS platforms use different sequencing technologies, but the basic principle of parallel sequencing is the same. Next-generation sequencing methods include sequencing by ligation (SBL) and sequencing by synthesis (SBS).
A probe with a fluorescent group hybridizes to a DNA fragment and ligates to the adjacent oligonucleic acid to be imaged. The emission wavelength of the fluorescent group is used to determine the sequence of a base or its complement. Essentially, the SBL method involves hybridization and ligation to a labeled probe. The probes contain one or two base-specific sequences and a series of universal sequences, which enable complementary pairing between the probe and the template. The anchored fragment then contains a sequence that is known to be complementary to the junction used to provide the junction site. After joining, the template is subjected to a sequencing reaction by the system. After the anchor and probe complexes or fluorescent groups are completely removed, or the junction site is regenerated, a new cycle begins again.
SBS relies on a large number of DNA polymerases for sequencing. Four different dNTPs are labeled with different colors of fluorescence. When the DNA polymerase synthesizes the complementary strand, each added dNTP will release a different fluorescence, and based on the fluorescence signals captured and processed by specific computer software, the sequence information of the DNA to be tested can be obtained.
Nucleic acids (DNA or RNA) are extracted from selected samples (peripheral blood, fresh frozen or paraffin-embedded tissue (FFPE), etc.) and then purified and quantified.The higher the quality of the DNA and RNA samples and the more intact the fragments, the better. If RNA is used, it must be reverse transcribed into cDNA.
Library construction is the process of DNA fragmentation and splice modification, which is an extremely important part of the entire next generation sequencing process, and its quality will directly affect the quality of the subsequent sequencing data. Usually, cDNA or DNA is randomly fragmented by enzyme treatment or sonication, and the optimal fragment length depends on the platform being used. cDNA fragments need to be spliced at both ends before they can be sequenced on the machine. The library is then enriched/amplified by PCR. The final library can be checked for quality control by qPCR to confirm the quality and quantity of DNA.
Sequencing instruments are important tools for gene sequencing and can be selected to suit different throughputs, read lengths and study fitness. Depending on the platform and chemistry chosen, clonal amplification of library fragments may occur prior to sequencer loading (emulsion PCR) or on the sequencer itself (bridge PCR).
There are three methods for generating clones of templates.
A junction and oligo complementary to the RNA fragment are immobilized on the beads. the DNA template is amplified using emulsion PCR (emPCR). Millions of DNA fragments can be cloned from a single bead. These beads can be categorized as glass surface or PicoTiterPlate.
Instead of water-in-oil PCR, PCR is performed directly on solid-state media, in which forward and reverse primers are bound to the surface of the chip, and these primers provide complementary sequences at the end of single-stranded DNA (ssDNA) for binding.
This method is a template enrichment technique done in solution, where the DNA is ligated, looped and sheared several times in order to produce a looped template containing 4 different junctions. Up to 20 billion DNA nanoballs can be generated by rolling circle amplification (RCA). The microsphere mixture is then dispensed onto the surface of the chip so that each microsphere can occupy one site on the chip.
NGS can be applied to whole genome sequencing (WGS) and whole exome sequencing (WES) to obtain information on point mutations, small insertions or deletions, copy number variations and structural variations. Whole transcriptome sequencing methods (RNA-Seq) can detect not only gene expression profiles, but also variable splicing, RNA editing and fusion transcripts. In addition, methylation analysis sequencing can be used to study epigenetic variation. Currently, NGS has been applied in many preclinical research areas, such as non-invasive prenatal genetic testing technology, genetic disease screening, infectious disease causative agent detection, and early tumor diagnosis research.
Reference
A, K, Gupta.; et al. Next generation sequencing and its applications. Animal Biotechnology. 2020: 395-421.