Go Back

Algorithms and Software for Support of Gene Identification Experiments

Sing-Hoi Sze, Michael A. Roytberg, Mikhail S. Gelfand, Andrey A. Mironov, Tatiana V. Astakhova and Pavel A. Pevzner

Gene annotation is the final goal of gene prediction algorithms. However, these algorithms frequently make mistakes and therefore the use of gene predictions for sequence annotation is hardly possible. As a result, biologists are forced to conduct time-consuming gene identification experiments by designing appropriate PCR primers to test cDNA libraries or applying RT-PCR, exon trapping/amplification, or other techniques. This process frequently amounts to ``guessing'' PCR primers on top of unreliable gene predictions and frequently leads to wasting experimental efforts.

We propose a simple and reliable algorithm for experimental gene identification which bypasses the unreliable gene prediction step. Studies of the performance of the algorithm on a sample of human genes indicate that an experimental protocol based on the algorithm's predictions achieves an accurate gene identification with relatively few PCR primers. Predictions of PCR primers may be used for exon amplification in preliminary mutation analysis during an attempt to identify a gene responsible for a disease. We propose a simple approach to find a short region from a genomic sequence that with high probability overlaps with some exon of the gene. The algorithm is enhanced to find one or more segments that are likely contained in the translated region of the gene and can be used as PCR primers to select appropriate clones in cDNA libraries by selective amplification. The algorithm is further extended to locate a set of PCR primers that uniformly cover all translated regions and can be used for RT-PCR and further sequencing of (unknown) mRNA.

The programs are implemented as web servers (GenePrimer and CASSANDRA) and can be reached at http://www-hto.usc.edu/software/procrustes/.

Go Back