Organization of database:

There are three levels: All genomes -> Species -> Gene.
Page I: All genomes - species list.
Page II: Species, type is selected in Page I.
Page III: Gene list and summary for the species selected in Page II.

 

Navigation in page I:

Prokaryotic species are listed in a sortable table. A click on the title of a column will reverse the sorting type ascending or descending order.
All the selections are button based, current selection is displayed as orange.
Selection of a combination of different categories currently is not supported.

 

Page II:

RefSeq / GenBank / PubMed are linked to NCBI.
If a particular species has several sequences, i.e., multiple chromosomes / plasmids, all sequences will be displayed in this table, and predictions for the longest sequence (usually the main chromosome) is highligted. User can click to choose other sequences.

The summary of prediction includes:
1) Gene by gene
      This link goes to the third page of the database, the gene level.
2 ) Circular genome viewer
      Developed by Stothard P, Wishart DS. Circular genome visualization and exploration using CGView. Bioinformatics 21:537-539.
3 ) Sequence tools
      Provided by Chen SL. Lee, W. and Chen, S.L. 2002. "Genome-Tools: A Flexible Package for Genome Sequence Analysis". Biotechniques 33(6):1334-41.

 

Page III :

The GeneMarkS program (MedlineArticle) was used to generate the gene predictions. GeneMarkS uses two model, the native model built from self-training and the heuristic model (MedlineArticle).The Class 1 genes are predicted by native model while the Class 2 genes are predicted by heuristic models.

Predicted gene's coordinates are compared to RefSeq annotation. There are four categories:
1) Exact match, both 5' and 3' match.
2) Different start, 5' match but 3' start codon doesn't.
3) Novel genes, present only in prediction.
4) Not confirmed genes, present only in annotation.

For the novel and not-confirmed genes, the NCBI CDD (Conserved Domain and Database) Search was performed. The number in parenthesis is the number of genes with significant hit to the CDD database. Beside the numbers, the list link will show detail information of each set.

The table below shows all predicted gene's coordinates.
SD
and SP shows the sequence of DNA and protein respectively. BD and BP run BLAST search on NCBI server. GB displays the gene's GenBank annotation.

 

Bottom of every page lists:

1) date of last update, version; 2) help link; 3) history of update; 4) email us your feedback; 5) To-do list and 6) frequently asked questions.
We welcome and appreciate your feedbacks. Thank you!