Score | Calculation |
---|---|
Specificity | The score is based on the assumption that specificity is determined by sequence homology of the 20 nucleotides preceding the protospacer adjacency motif (PAM). Assuming that the first 5’ bases of the protospacer can possess ambivalent specificity, the user can exclude it from the specificity calculations. The remaining protospacer is mapped against the target genome using bowtie in different adjustable modes (high or low sensitivity). For the highest sensitivity, up to three mismatches in the protospacer are allowed in the mapping. Furthermore each mapped protospacer is required to be followed by a specific PAM (NAG/NGG). When all on- and off-targets of a single sgRNA are mapped, the specificity score is calculated. The score starts with a maximum of 100. If no off-targets exist, the score remains at 100. For each off-target, the number of homologous nucleotides of the off-target divided by the off-target count is subtracted from the score. This way a perfect matching first off-target subtracts 20 and a perfect matching 2nd off-target subtracts 10. This way there is a high penalty if there exists an off-target but a lowering penalty, when there are many more off-targets. |
Annotation | This score is based on general assumptions about positions, where an sgRNA should bind with respect to a genes transcript model in order to properly alter this genes function. Such are a positioning in common transcripts and in coding exons, which are best early exon such as the first, second or third. Therefor the score is calculated the following way. First it is set to 0. Then all annotations overlapping the region, where the sgRNA under investigation would bind are parsed. For each coding sequence, and exon 5, divided by the number of the respective exon, is added. For every gene, which is hit 1 is added. For every start or stop codon hit 1 is added. For every CpG island, predicted computationally, 1 is subtracted from the score. All in all, this score should enable cld to reliably sort sgRNA designs according to preferable target region in a gene. In clds current version it is possible to provide a weight to every different aspect of this score. E.g. to give much more weight on targeting coding exons, rather than UTR’s. |
Extra | In this score users of cld can apply any custom perl function, which works on the 30 mer around the protospacer target site introduced by Doench et al.. The scoring function needs to be setup as having a 30-character string as input and a numeric characters output. Again here the higher the score the better the target should be given. |