Introduction to mRNA Sequence Elements: Cap, CDS, UTR & PolyA

mRNA Sequence Design in in Vitro Transcription

mRNA synthesized through in vitro transcription is a meticulously designed sequence to mimic the biological function of natural mRNA single-stranded transcripts in protein translation. The mRNA sequence consists of five key structural elements: the 5' cap structure (Cap), 5' untranslated region (UTR), coding sequence (CDS), 3' untranslated region (UTR), and poly(A) tail. mRNA structure is highly unstable and prone to degradation, and rational sequence design is a highly effective strategy for enhancing mRNA stability.

Structural elements of in vitro transcribed (IVT) mRNA.Structural elements of in vitro transcribed (IVT) mRNA. (Schoenmaker, L.; et al, 2021)

* Related services from BOC RNA.

Basic Elements of mRNA Sequence

5' Cap mRNA

The 5' cap is a nucleotide modification at the 5' end of some primary transcripts, such as precursor messenger RNA (pre-mRNA). This process, known as mRNA capping, is highly regulated and crucial for generating stable and mature messenger RNA during protein synthesis. The 5' m7G Cap structure of eukaryotic mRNA is an essential component that mediates various biological functions, such as nuclear export, splicing, RNA stability, and efficient translation. In the cytoplasm, the 5' m7G Cap structure serves as an anchoring site for the eukaryotic translation initiation factor EIF4E, which is crucial for initiating protein translation. The 5' m7G Cap structure also protects mRNA from degradation by 5'-3' exonucleases.

When preparing mRNA in vitro, there are generally two different capping methods: enzymatic capping and co-transcriptional capping. Enzymatic capping utilizes vaccinia capping enzyme and 2'-O-methyltransferase, while co-transcriptional capping employs cap analogs ARCA/GAG to incorporate the cap structure into mRNA in a one-step transcription process. The vaccinia virus capping enzyme used in enzymatic capping possesses three enzymatic activities: RNA triphosphatase activity, guanylyltransferase activity, and guanine-N7 methyltransferase activity, enabling the addition of Cap0 to the 5' end of IVT-synthesized mRNA. The 2'-O-methyltransferase utilizes SAM as a donor and adds a methyl group to the 2'-O position of the first nucleotide of Cap0-mRNA, forming Cap1-mRNA. Due to the two-step enzymatic reaction, mRNA obtained from enzymatic capping may carry different cap structures. Co-transcriptional capping forms mRNA with a single cap structure; adding a dinucleotide cap analog (ARCA) synthesizes mRNA carrying Cap 0, while adding a trinucleotide cap analog (GAG) synthesizes Cap1 mRNA.

5' UTR

The 5' untranslated region (UTR) is a non-coding region located upstream of the mRNA coding sequence (CDS) and serves as a binding site for ribosomal initiation translation. The average length of the 5' UTR is generally 100 -220 nucleotides. In mRNA transcripts encoding certain proteins in vertebrates, the 5' UTR sequence is often longer, such as in transcription factors, proto-oncogenes, growth factors and their receptors, and proteins with low translation efficiency under normal conditions.

Additionally, a high GC content is a conserved feature of 5' UTR sequences. In the 5' UTR sequence, increasing the GC content can decrease translation efficiency while maintaining thermal stability and a constant distance from the cap to the hairpin structure. Eukaryotic translation initiation requires the recruitment of ribosomal subunits at the 5' m7G Cap structure, with the start codon typically located downstream at a relatively distant position, necessitating ribosomal movement to that site. The majority of eukaryotic mRNA 5' UTR sequences contain a Kozak sequence (GCCACCAUGG), which includes the start codon and is involved in translation initiation. Various strategies can be employed to optimize the 5' UTR sequence to enhance mRNA translation efficiency and stability. Different 5' UTR sequences may be suitable for different mRNA structures and target cells.

Coding Sequence

The coding sequence (CDS), also known as the open reading frame (ORF), is crucial for mRNA vaccines encoding antigens and mRNA therapeutics encoding other proteins. CDS constitutes the largest proportion of mRNA sequence length. Optimization strategies for CDS primarily focus on codons to enhance protein translation efficiency.

Current codon optimization primarily relies on the Codon Adaptation Index (CAI) as a fundamental optimization parameter. This strategy involves replacing low-frequency codons in the exogenous mRNA sequence with synonymous codons that are frequently used in the host cell, aligning the codon usage bias in the exogenous mRNA sequence with that of the host cell to avoid the presence of rare codons. However, factors affecting the translation efficiency of exogenous mRNA in host cells are diverse. Apart from codons, many other factors such as GC content, mRNA secondary structure, etc., also influence mRNA translation.

3' UTR

Similar to the 5' UTR, the primary role of the 3' UTR is to regulate mRNA translation, interact with protein complexes, mediate mRNA transport, stability, and translation processes. The 3' UTR sequence is crucial for targeting transcripts to specific cellular compartments, particularly in highly polarized or differentiated cells. The 3' UTR serves as a vital regulatory element, with microRNAs (miRNAs) silencing mRNA expression by binding to this region. Studies have shown that mRNA in rapidly proliferating cells tends to exhibit shorter 3' UTR sequences, which reduces miRNA binding sites and enhances protein expression. Moreover, reducing non-structured sequences within the 3' UTR sequence can promote the binding of the poly(A) tail to translation elements, thereby enhancing protein translation efficiency. Finally, the effects of UTRs may vary depending on cell type, thus allowing for optimization and screening of the optimal 3' UTR sequences for specific target cells.

PolyA Tail

PolyA tail is characterized by a long stretch of repeated adenosine (A) nucleotides. Generally, longer PolyA tails protect the mRNA coding region from adenylation and degradation enzymes, resulting in more stable mRNA. However, shorter PolyA tails also exist and can maintain mRNA stability. Complex interactions exist between PolyA Binding Protein (PABP) and other translation initiation proteins, collectively regulating mRNA stability and translation efficiency. Researchers conducting full-length mRNA sequencing have found that besides adenosines, other nucleotides are also incorporated into the tail sequence. The efficiency of the transcription regulation complex (CNOT) in removing cytidine is low, thus the incorporation of cytidine into the tail sequence can protect exogenously synthesized mRNA from CCR4-NOT transcription complex adenylation, thereby enhancing the stability of exogenous mRNA synthesis and improving protein translation efficiency.

There are two strategies for the in vitro synthesis of PolyA tail: the first is enzymatic capping, which uses recombinant PolyA polymerase to extend mRNA synthesized in IVT reactions. This enzymatic tailing method cannot produce a fixed-length PolyA sequence, which is disadvantageous for quality control during the manufacturing process. The second strategy involves adding a long stretch of A sequence or incorporating other nucleotides into the polyA tail in the plasmid DNA template sequence, completing a one-step co-transcriptional tailing. One-step tailing ensures highly controlled tail lengths. However, the challenge of co-transcriptional tailing lies in the possibility of tail loss during plasmid amplification. Therefore, the integrity of plasmid templates and PolyA should be considered as important quality attributes for fermentation process development.

Effect of Nucleotide Type on mRNA Sequence and Function

Changing the nucleotide types in mRNA is another strategy to improve mRNA metabolic characteristics, such as incorporating modified nucleotides or increasing purine and pyrimidine nucleotides. Incorporating modified nucleotides can reduce the innate immune activity of exogenously synthesized mRNA, such as significantly enhancing protein translation levels by substituting pseudouridine (Ψ) or N1-methylpseudouridine (m1Ψ) for uridine. Toll-like receptors TLR7/8 can recognize uridine and trigger innate immune responses, while increasing the GC content will decrease uridine in the mRNA sequence, thereby reducing stimulation of innate immune responses. The 3' UTR sequences of mRNA encoding proto-oncogenes or cytokines often contain adenine/uridine-rich elements (AREs). When cells are under stress response, ARE-binding proteins rapidly degrade mRNA carrying AREs. Therefore, 3' UTR sequences with high GC content will prevent exogenously synthesized mRNA from being degraded by ARE-binding proteins. Scientists have found that reporter gene mRNA rich in AU translates proteins at much lower levels in cells compared to reporter gene mRNA rich in GC. Furthermore, high GC content can also maintain protein stability by protecting mRNA from endogenous ribonuclease degradation, thereby enhancing protein translation efficiency.

In summary, nucleotide types and GC content can enhance mRNA protein expression levels and reduce mRNA immunogenicity.

The Importance of mRNA Sequence Design in mRNA Manufacturing

The rational design of mRNA sequences is crucial for enhancing mRNA production efficiency, stability, and therapeutic effectiveness, making it an indispensable key step in the development of RNA drugs and vaccines.

Enhanced Stability

By avoiding structures prone to nucleolytic degradation, such as AU-rich sequences, stable 3' end sequences, capping, and structures resistant to nucleases at the polyA tail, designed mRNA sequences can increase stability and prolong their half-life within cells.

Translation Efficiency Enhancement

Through optimizing codon usage in the coding region (CDS), adjusting 5'UTR and 3'UTR sequences, and considering mRNA secondary structures, designed mRNA sequences can enhance translation efficiency, ensuring more mRNA is translated into proteins, thereby increasing protein expression levels.

Reduced Immunogenicity

By avoiding sequences recognized by pattern recognition receptors capable of activating the immune system, such as uridines recognized by TLR7/8, designed mRNA sequences can reduce immunogenicity, decreasing recognition and response from the immune system, thus avoiding unnecessary immune reactions and side effects.

Improved Targeting

By adjusting mRNA sequences to achieve targeted expression, for example, by including specific structural elements in the 3'UTR sequence, therapeutic effectiveness can be enhanced while adverse reactions are reduced, enabling targeted delivery or stability in specific cell types.


  1. Schoenmaker, L.; et al. mRNA-Lipid Nanoparticle COVID-19 Vaccines: Structure and Stability. International Journal of Pharmaceutics. 2021, 601: 120586.
* Only for research. Not suitable for any diagnostic or therapeutic use.
Inquiry Basket