Using fast homology search tools for protein sequence functional annotation: a comparison

Poster (download)

[pdf-embedder url=”https://bgrssb.icgbio.ru/wp-content/uploads/2020/07/268.pdf”]
Pronozin Artem¹, Mikhail Genaev², Dmitry Afonnikov³
¹lInstitute of Cytology and Genetics SB RAS, pronozinartem95@gmail.com
²lInstitute of Cytology and Genetics SB RAS, mag@bionet.nsc.ru
³lInstitute of Cytology and Genetics SB RAS, ada@bionet.nsc.ru

Annotation of the protein sequences by homology search and GO term transfer from highly homologous sequences is an important task for current genome and transcriptome sequencing projects. However, large size of sequence databases make homologous sequence search difficult in reasonable time. There exist tools that apply fast and ultrafast database search algorithms to find sequence homologs. These tools usually apply various heuristics for fast determining possible sequence matches. This result in different results of these programs with respect to returned set of homologous sequences and their rankings. These differences may lead to differences in the sets of GO terms and lead to errors in query sequence function annotation.В We compare performance of the highly homologous sequence detection by several fast search tools (BLASTP fast, Diamond, Usearch ublast, Usearch local, Mmseq2) applied for A.thaliana protein sequences represented in OrthoDB database. We compared their results with the sequence ranking obtained by ClustalW program for various number k of returned best hits.