Single Nucleotide Polymorphisms(SNPs) are variations in a single nucleotide in the genome, including transitions, transversions, deletions, and insertions, forming genetic markers with a large number and rich polymorphism. Theoretically, each SNP site can have four different variant forms, but in practice, only two occur, namely transitions and transversions, with a ratio of 2:1. SNPs occur most frequently in CG sequences, and most often involve the conversion of C to T. This is because cytosine in CG sequences is frequently methylated, and subsequently undergoes spontaneous deamination to become thymine. Generally, SNP refers to single nucleotide variations with a frequency greater than 1%. In the human genome, there is approximately one SNP per 1000 base pairs, with a total of around 3 × 106 SNPs in the human genome. Therefore, SNPs have become the third-generation genetic markers, and many phenotypic differences, susceptibility to drugs or diseases, and more may be related to SNPs. SNPs can be classified based on their location in the gene into coding regions, non-coding regions, and intergenic regions (regions between genes). Due to the redundancy of the genetic code, SNPs in coding sequences may not necessarily alter the amino acid sequence of the protein. SNPs in coding regions can be of two types: synonymous and non-synonymous. Synonymous single nucleotide polymorphisms do not affect the protein sequence, while non-synonymous ones do alter the amino acid sequence of the protein. SNPs that are not in protein coding regions may still affect gene splicing, transcription factor binding, mRNA degradation, or non-coding RNA sequences. Gene expression affected by such single nucleotide polymorphisms is referred to as expression single nucleotide polymorphism (ESNP), which may occur upstream or downstream of the gene.
Description of single nucleotide polymorphisms (SNPs). (Kim, Y. H.; et al, 2015)
* Related services from BOC RNA.
Common SNPs can occur both in coding and non-coding regions of genes. Although the probability of occurrence in coding regions is relatively small, it can affect gene function, leading to changes in biological traits, and thus holds significant importance in the study of genetic diseases. As a third-generation genetic marker, SNPs are densely distributed throughout the entire animal and human genome, highly correlated with functional genes, have a low mutation rate, and exhibit strong genetic stability, making them suitable for high-throughput and automated analysis. While some SNP sites may not directly correlate with the expression of disease genes, they become important genetic markers because of their proximity to certain pathogenic genes. They can be used for the following research purposes:
The direct sequencing method is a commonly used approach to study SNP sites. Based on the Sanger sequencing principle, which employs the dideoxy chain termination method, it faithfully extends the base sequence on the template strand. In capillary electrophoresis, fluorescence signals corresponding to each base are sequentially collected, and SNPs are identified as peak patterns in the sequencing results.
During PCR reaction, a pair of MGB-specific probes with different fluorescent labels are added to identify different alleles. The 5' end of the probe carries a reporter fluorescent group, while the 3' end carries a quencher fluorescent group. These probes can specifically anneal to complementary sequences between the forward and reverse primers during the PCR process. When the probe is intact, fluorescence is weak due to energy resonance transfer. Upon specific binding of the probe to the corresponding allele gene, DNA polymerase exhibits 5' to 3' exonuclease activity, cleaving the reporter fluorescent group and releasing it from the quenching effect of the quencher at the 3' end, thereby emitting fluorescence. The 5' end of the two probes is labeled with different fluorophores (FAM or VIC), while the 3' end is labeled with the MGB quencher group. Based on the detection of different fluorescence, the genotype of the corresponding SNP allele in the sample can be determined.
Restriction Fragment Length Polymorphism (RFLP) is an earlier technique used for SNP genotyping. Simply put, it involves specific restriction endonuclease cleavage sites within sequences containing SNP loci. Changes in the genotype at the SNP locus render the cleavage site ineffective. Therefore, the polymorphism in PCR fragment lengths after enzyme digestion can determine the corresponding genotypes.
In response to the above situations, if suitable cleavage sites cannot be found, or if the cost of the required endonuclease is high, mismatches can be introduced in PCR primers to obtain ideal cleavage sites. Furthermore, to improve the accuracy of this method, one approach is to select an internal reference cleavage site in the PCR product (identical to the target cleavage site) to check whether cleavage is complete. Another approach is to enhance the fluorescence restriction enzyme digestion method by adding fluorescence to the PCR products. By collecting fluorescence signals with a sequencer, the length of the product can be reported. The advantage of this method lies in the higher precision and resolution of capillary electrophoresis, the ability to mix samples, reducing the cost of genotyping experiments, and increasing efficiency.
Matrix Assisted Laser Desorption/Ionization Time of Flight Mass Spectrometry (MALDI-TOF MS) is a technique used for SNP genotyping. The principle involves first amplifying the target sequence through PCR and then adding SNP-specific extension primers to extend one base at the SNP locus. The prepared sample analyte is co-crystallized with the chip matrix, placed in the vacuum tube of the mass spectrometer, and then excited by a nanosecond strong laser pulse. The matrix molecules absorb radiation energy, causing energy accumulation and rapid heating, leading to sublimation of the matrix crystal. Nucleic acid molecules desorb and transform into metastable ions. The resulting ions are mostly singly charged ions, which gain the same kinetic energy in the accelerating field and are then separated in a non-electric field drift region based on their mass-to-charge ratio, flying in a vacuum tube to reach the detector. The ions produced by MALDI are commonly detected using Time-of-Flight (TOF) detectors. The smaller the ion mass, the faster it arrives. Taking advantage of the high sensitivity of mass spectrometry analysis to mass, it is easy to distinguish between two gene sequences containing only one different base and deduce SNP genotyping.
The SNaPshot technology, also known as mini-sequencing, is a genotyping technique primarily targeted at medium throughput (<20) SNP genotyping projects. Since it's called mini-sequencing, its principle is quite similar to first-generation sequencing. In a reaction system containing sequencing enzyme and four fluorescently labeled ddNTPs (note: only ddNTPs are used here, not dNTPs as in sequencing reactions), different-length extension primers adjacent to the 5' end of the polymorphic site and the PCR product template, the primer extends by one base before termination. After running on a sequencer, the types of bases incorporated can be determined based on the color of the peaks, thus determining the genotype of the sample. Different-length extension primers are designed for different SNP sites to achieve genotyping of multiple SNPs in one reaction system.
The LDR method is based on the principle of nucleic acid-specific hybridization. It involves designing two discriminating primers with different 3' end bases to identify the two alleles of SNP sites, along with a universal primer designed on the other side of the site. Under the action of a high-temperature ligase, when both the discriminating primers and the universal primer completely hybridize with the target DNA sequence without any gaps between them, ligation reaction occurs. This specific ligation reaction can be repeated through temperature cycling, achieving linear amplification. Finally, detection of SNP sites is accomplished by fluorescently scanning the fragment lengths (fluorescent modification is performed at one end during the synthesis of the universal primer).
The iMLDR technique is a multiplex SNP genotyping method based on the traditional ligation detection reaction (LDR) with improvements. Compared to conventional LDR techniques, iMLDR enhances accuracy and the success rate of genotyping. The distinguishing feature of this method lies in its utilization of a dual ligation reaction, where fluorescent markers distinguishing genotypes are attached to the ligation products using a ligation approach. This facilitates an easy increase in the throughput of this genotyping method.
DNA gene chip technology is a newly developed tool for detecting DNA sequence variations in recent years. Its principle involves using target DNA to react specifically with a dense array of oligonucleotide probes fixed on a support material. The presence and intensity of reaction signals are used to determine SNP sites. In recent years, with the deepening research on complex diseases and the increase in available genomic data, SNP chip reactions based on various principles have been developed to meet the needs of gene typing for different purposes, scales, and conditions.
The SNPSCAN typing method utilizes a highly specific ligation enzyme ligation reaction to identify the alleles of SNP sites. This is achieved by introducing non-specific sequences of different lengths at the end of the ligation probes and obtaining different length ligation products corresponding to the sites through a ligation enzyme ligation reaction. The ligation products are then PCR amplified using a fluorescently labeled universal primer, and the amplified products are separated by fluorescence capillary electrophoresis. Finally, specialized software is used to analyze and obtain the genotypes of each SNP site.
Reference