Go Back

Detection of HTH Motifs via Data Mining

Yuan Gao (1), Mu Yang (1), Xingqiang Wang (1), Kalai Mathee (2) and Giri Narasimhan (1)

(1) Mathematical Sciences Dept., University of Memphis

(2) Dept. of Molecular Microbiology and Immunology, University of Tennessee-Memphis

It is known that many repressor as well as activator proteins regulate gene expression by binding in a sequence-specific manner to operator DNA. It is also known that such proteins have a common substructure known as the helix-turn-helix (HTH) motif. Although the basic mechanism is not well understood, it is thought that the recognition involves a set of non-covalent interactions between some amino acids within the motif. The HTH motif is a 20 amino acid stretch where the second helix is presumed especially important for recognition. The ability to detect or predict the HTH motif may shed light on molecular structure, function, and evolution. As a first step, we have implemented the matrix scoring method proposed by Dodd and Egan (1990) and have made it available for use over the World Wide Web (WWW) at the following URL "http://www.msci.memphis.edu/ ~giri/hth". Dodd and Egan's scheme assigns a weight for an amino acid to appear at a certain location in the motif. These weights are then totaled to compute a score for every 20 amino acid subsequence of the original sequence. The highest score for any subsequence of the given sequence is used as a measure of the probability that the sequence is an HTH motif. Its main drawback is that it is not context-sensitive, since no weight is assigned for certain amino acids appearing simultaneously (either contiguous or non-contiguous) in a motif. However, evolution has taught us that a combination of several amino acids within the motif is largely responsible for the stability of the structure; the other amino acids are free to change by random mutations.

In an attempt to improve the sensitivity and selectivity of the HTH recognition method, we develop a technique using methods from the new field of Data Mining, where they were used in order to generate association rules. We use this method, developed by Agrawal, Imielinski, and Swami (1993), to find frequently occurring patterns in a set of over a hundred HTH motifs and use them as the basis for HTH detection and prediction. We incorporate our new findings to produce a new and improved version of the HTH detection program, which will be made available soon over the WWW. The strength of our new method lies in the fact that it gives the molecular biologist a design tool for mutagenesis experiments, since the patterns we generate indicate which amino acids may be structurally or functionally important. Furthermore, it also gives a natural way for categorizing HTH motifs (and the corresponding proteins) based on the set of patterns in them. Motifs in the same group are likely to have similar binding affinities and the corresponding proteins are likely to have similar functions. Finally, it can also be used in comparative evolution studies, which may indicate the evolutionary relationships between proteins containing different HTH motifs. While giving us an interesting way to look at HTH motifs, it also raises many intriguing questions to pursue from the standpoint of computer-aided search as well as molecular biology. Our pattern generation algorithm is a generic algorithm and may be applicable to detecting other motifs that display similarities in structure and sequence.

Go Back