Key Bases and Base Pairs in Bioinformatics 📂Algorithm

Key Bases and Base Pairs in Bioinformatics

Definition

The following five bases are referred to as the Canonical Bases:

Purine bases: Adenine $A$, Guanine $G$
Pyrimidine bases: Cytosine $C$, Thymine $T$, Uracil $U$

Description

Thymine is only used in DNA, while Uracil is used in RNA. Therefore, by checking whether $T$ or $U$ is used in the data, one can tell whether it is a DNA or RNA base sequence.

A Base Pair is formed by two bases capable of hydrogen bonding, with one selected from each of the purine and pyrimidine bases. Among them, there are $A-T, A-U, G-C$ possible cases.

$A-T$ and $A-U$ are connected by 2 hydrogen bonds, and $G-C$ by 3 hydrogen bonds. DNA has a double helical structure due to base pairing. Therefore, if $A$ is on one strand, $T$ must be on the opposite strand. $$ A-T \\ C-G \\ C-G \\ G-C \\ T-A \\ T-A \\ A-T \\ C-G $$ For example, with a DNA sample like the one above, knowing one strand is enough. Thus, when acquiring data, one can simply read the left side and record it like this: ACCGTTAC. The significance of the double helical structure is essentially a ‘backup’. Indeed, RNA, which is made of a single strand and has an unstable structure, often causes problems. However, DNA can stably pass on genetic information to future generations as the opposite strand can serve as a reference when problems arise on one strand.