logo

Key Bases and Base Pairs in Bioinformatics 📂Algorithm

Key Bases and Base Pairs in Bioinformatics

Definition

The following five bases are referred to as the Canonical Bases:

  1. Purine bases: Adenine AA, Guanine GG
  2. Pyrimidine bases: Cytosine CC, Thymine TT, Uracil UU

Description

Thymine is only used in DNA, while Uracil is used in RNA. Therefore, by checking whether TT or UU is used in the data, one can tell whether it is a DNA or RNA base sequence.

A Base Pair is formed by two bases capable of hydrogen bonding, with one selected from each of the purine and pyrimidine bases. Among them, there are AT,AU,GCA-T, A-U, G-C possible cases.

ATA-T and AUA-U are connected by 2 hydrogen bonds, and GCG-C by 3 hydrogen bonds. DNA has a double helical structure due to base pairing. Therefore, if AA is on one strand, TT must be on the opposite strand. ATCGCGGCTATAATCG A-T \\ C-G \\ C-G \\ G-C \\ T-A \\ T-A \\ A-T \\ C-G For example, with a DNA sample like the one above, knowing one strand is enough. Thus, when acquiring data, one can simply read the left side and record it like this: ACCGTTAC. The significance of the double helical structure is essentially a ‘backup’. Indeed, RNA, which is made of a single strand and has an unstable structure, often causes problems. However, DNA can stably pass on genetic information to future generations as the opposite strand can serve as a reference when problems arise on one strand.