by Pavel Vychyk | Duvalov E. | Digris A. | Skakun V. | Nikolaichik Y. | Department of Molecular Biology,
Belarusian State University, Minsk, Belarus | Department of Systems Analysis and Computer Modelling,
Belarusian State University, Minsk, Belarus
Motivation and Aim:
Most of the available bacterial genome annotations provide information about ORFs and
their products. Adding regulatory elements to the annotation shows which transcription
factor (TF) controls them and when the genes might be expressed, allows answering basic
questions and facilitates constructing strains with desired properties.
Our aim in this work was to develop a new TF binding site database permitting reliable
automated transfer of regulatory information between bacterial genome sequences.
Methods and Algorithms: We have previously proposed a 3D-structure based strict formal
criterion for applying regulatory information to any bacterial genome – CR-tag – the amino
acid residues of a TF that specifically contacts the nitrogenous bases of the regulatory
element in genomic DNA.
For each TF present in RegulonDB, CollecTF, RegpreSize databases, our automated de novo
TFBS inference pipeline was run to collect TF gene-linked operators from all genomes
encoding TFs with the same CR-tag. Hidden Markov models for each motif were built with
threshold cutoff scores manually set to find known operators with minimum or no false
Results: The database currently covers only TFBS and has two divisions: the core and the
extended collection. The core includes 237 regulatory elements and has undergone manual
curation including verification of experimental evidence and determination of threshold
scores for HMM models. Each core record has manually assigned experimental evidence
codes and is linked to the corresponding literature. The extended collection includes
information on over 3000 regulatory elements exported from various databases with
majority of information coming from RegPrecise.
Conclusion: The advantage of BacRegBD is the CR-tag concept – a fingerprint uniquely matching transcription factors with their operators. All regulatory motif records in the
database are associated with a CR-tag and, therefore, can be correctly used to annotate
similar elements in any genomes encoding a TF with an identical CR-tag.