Poster (download)
[pdf-embedder url=”https://bgrssb.icgbio.ru/wp-content/uploads/2020/07/268.pdf”]
Pronozin Artem1, Mikhail Genaev2, Dmitry Afonnikov3
1lInstitute of Cytology and Genetics SB RAS, pronozinartem95@gmail.com
2lInstitute of Cytology and Genetics SB RAS, mag@bionet.nsc.ru
3lInstitute of Cytology and Genetics SB RAS, ada@bionet.nsc.ru
Annotation of the protein sequences by homology search and GO term transfer from highly homologous sequences is an important task for current genome and transcriptome sequencing projects. However, large size of sequence databases make homologous sequence search difficult in reasonable time. There exist tools that apply fast and ultrafast database search algorithms to find sequence homologs. These tools usually apply various heuristics for fast determining possible sequence matches. This result in different results of these programs with respect to returned set of homologous sequences and their rankings. These differences may lead to differences in the sets of GO terms and lead to errors in query sequence function annotation.В We compare performance of the highly homologous sequence detection by several fast search tools (BLASTP fast, Diamond, Usearch ublast, Usearch local, Mmseq2) applied for A.thaliana protein sequences represented in OrthoDB database. We compared their results with the sequence ranking obtained by ClustalW program for various number k of returned best hits.