Go Back

GeneGenerator - A Flexible Algorithm for Exon-Intron Prediction and its Application to Maize

J.Kleffe, K.Hermann, W.Vahrson, B.Wittig and V.Brendel

FU Berlin and Stanford University

Several algorithms for exon-intron prediction have been developed in the last few years and are widely used. Unfortunately, scoring functions and gene models used in these methods are mostly specific to human or a few model organisms. We developed GeneGenerator in need of a tool to predict exon-intron structure in maize and focused on maximum flexibility of the basic algorithm. GeneGenerator can list all possible structures satisfying user-defined constraints. Mostly, however, many unlikely correct structures must be ruled out during runtime. GeneGenerator includes user-defined selection criteria for which structures to retain. Special implementations of such criteria render the algorithm equivalent to those in (2,3,6).

GeneGenerator runs under Borland-Pascal 7.0 using MS-DOS. It is based on the Pascal unit DNAStat (4).

For an application, 46 sequences from Zea mays were retrieved from GenBank and compiled into a specific, non-redundant database (5), comprising a total of 250 exons. Each sequence was made subject to gene prediction and 40 best solutions generated using all remaining 45 sequences to train the control functions. The table provides the usual performance measures (1) for Markov models of orders 3, 4 and 5, and two ways of selecting the gene prediction from the alternatives GeneGenerator has provided. The first choice (T) is the top scoring prediction. The second choice (B), called best, is one that we do not really have. It tells how good gene prediction would be if we were able to identify from the list of predictions the one with the highest correlation to the correct structure. Evidently, there is plenty of space for improvement by such a scrutiny that would ideally incorporate as much biological insight as available.

Acknowledgments: V.B. was supported in part by NIH grant 5R01HG00335-07. K.H. was supported by Deutsche Forschungsgemeinschaft Projekt KL 760/1-3

References:

  1. Burset, M. and Guigo, R. (1996). Genomics 34, 353-367.
  2. Gelfand, M.S. et al. (1996) J.Comp.Biol. 3, 223-234.
  3. Gelfand, M.S. & Roytberg, M.A. (1993). BioSystems 30, 173-182.
  4. Kleffe, J. et al. (1995). Comp. Appl. Biol. Sci. 11, 449-455.
  5. Kleffe, J. et al. (1996) Nucl. Acids Res. 24, 4709-4718.
  6. Wu, T.D. (1996). J. Comp. Biol. 3, 375-394.
  Coding nucleotides Exons Introns
Options SN SP CC SN SP SN SP
3_T .92 .88 .85 .64 .61 .65 .75
3_B .94 .91 .89 .76 .77 .74 .88
4_T .93 .92 .90 .75 .73 .77 .78
4_B .95 .96 .94 .88 .86 .88 .87
5_T .95 .92 .91 .75 .72 .76 .74
5_B .96 .97 .95 .90 .88 .93 .91

Go Back