logo

Substitution Matrix in Sequence Alignment 📂Algorithm

Substitution Matrix in Sequence Alignment

Definition

A Substitution Matrix is the matrix used as a criterion for matches and mismatches when scoring sequence alignment.

Example

using BioAlignments

EDNAFULL
BLOSUM45
PAM30

Let’s dive straight into the examples. In Julia, there is a package called BioAlignments available, which allows one to easily load the desired substitution matrix. When loading matrices frequently used for DNA analysis like EDNAFULL or BLOSUM (BLOcks SUbstitution Matrix), and PAM (Point Accepted Mutation) matrices used for protein sequences, we get the following.

20201112_195847.png 20201112_195901.png 20201112_195918.png

Not all matches are treated equally, nor are all mismatches. This reflects some frequency-based insights. An example from information retrieval theory could be typo correction based on the keyboard layout. For instance, when considering the nonsensical word “juloa”, since i and o are next to each other on the QWERTY keyboard, it is logical to give a low mismatch penalty between i and o, suggesting that it might be a typo for julia. Similarly, although julea could also be a typo for julia, since the distance between i and e is greater, the same typo is less suspected of being coincidental compared to juloa.