High-throughput sequencing has revealed numerous patterns and variations in the genomic architecture of diverse species of bacteria. By observing the natural variation of specific families of transcription factors in divergent species, we can identify specificity-determining amino acids and their cognate binding sequences. We have tested this approach using the cyclic AMP receptor (CRP) and fumarate and nitrate reduction regulatory (FNR) family of winged helix-turn-helix proteins. For this family of proteins, we were able to computationally identify eight sets of amino acid combinations that are predicted to yield novel DNA-binding specificity. These computational predictions were tested experimentally using the CRP protein from Escherichia coli.
Of the eight computational predictions, four were shown experimentally to have novel DNA-binding specificity in vivo. We tested these designs using both the native CRP promoter and a tetracycline-inducible one. In the process, we were able to identify how expression levels determine the activity of these engineered proteins. Collectively, these results demonstrate the utility of comparative genomics for computationally-directed protein engineering.