Detailed explanation of the diferente measures of sequence conservation used in this database ΒΆ


The oligonucleotides were ranked considering three main measures of sequence conservation:

- Percentage of identical sites (PIS)
The PIS is calculated by dividing the number of equal positions in the alignment for an oligonucleotide by its length

- Percentage of identical sites in the last five nucleotides at the 3’ end of oligonucleotide (3’PIS)
The 3'PIS is calculated as the PIS but only considering the last five nucleotides of the oligonucleotide

- Percentage of pairwise identity (PPI)
The PPI is calculated by counting the average number of pairwise matches across the positions of the alignment where the oligonucleotide is located. We then divide this value by the total number of pairwise comparisons

- Score
The ranking score (‘score’) considers the mean value of the three different measures (PIS, 3’PIS and PPI).



Example

The four main measures described above are calculated as follows for HIV2ID0061 oligonucleotide (23 nucleotides) considering the HIV2 alignment (95 sequences):


Percentage of identical sites: (16 identical sites/23 sites)*100 = 69.57%


3'Percentage of identical sites: (5 identical sites/5 sites)*100 = 100.0%


PPI: (97423 pairwise matches/104880 total pairwise comparisons)*100 = 92.89%

Score: (69.57+100+92.89)/3=87.49