| Web GeneMark Instructions |
|
Input Sequence Running Options RBS model Chose an appropriate RBS model (or none). Use alternate genetic code If desired, you can analyze your sequence with an alternative conventions for the genetic code. Choose between "Eukaryote", where the only start codon is ATG, and "Mycoplasma", where TGA codes for Tryptophan. For reference, the "Standard" code, which is used by default, considers ATG, GTG and TTG as start codons and TAA, TAG and TGA as stop codons. Window size Default window size is 96 base pairs. A short window is better for finding short genes but may generate more false predictions. A long window increases the false negative rate. Possible choices are 48, 72, 120 and 144nt. Step size A distance between adjacent windows. Default step is 12 base pairs. Short steps slightly improve accuracy but increase runtime. Long steps decrease runtime but reduce accuracy. Other possible choices are 3, 6, 24, 48 and 96nt. Threshold Minimum a posteriory probability value for calling an ORF a gene. Obviously, by choosing a low threshold you may increase the number of ORFs called as predicted genes. This may increase the number of false positive and decrease the number of false negative predictions. The default threshold, 0.5, keeps these two error rates about equal. Possible choices are 0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8 and 0.9. Output Options Graphical output options Mark ORFs on graph Open reading frames (ORFs) will be indicated in the graph as thin, horizontal lines at the 0.5 level in each frame of the graph. Choosing this option will mark all possible ORFs in the sequence, including the ORFs with significant coding potential. Mark regions on graph Regions of interest indicated in the graph as thick, horizontal, grey bars. Mark stop codons on graph Stop codons will be indicated in the graph by descending ticks at 0.5 level. Mark start codons on graph Start codons will be indicated in the graph by upward ticks at 0.5 level. Minor start codons, GTG are indicated by smaller upward ticks. Mark frameshifts on graph Possible frameshift errors will be indicated by vertical arrows. Mark putative exon splice sites Possible exon boundaries will be inidicated using angular brackets. These boundaries are based solely on coding potential information. For best results in predicting exons (and genes) in Eukaryotic DNA, we recommend the eukaryotic version of GeneMark.hmm. Print graph in landscape format Self explaining. Email address Address to email the graphical output to. Required if graphical output is selected. Text output options List open reading frames (ORFs) predicted as coding sequences (CDSs) Prints a list of ORFs with coding potential greater than the selected threshold value, 0.5 by default. List regions of interest Prints a list of regions of interest that have at least a part of it exhibiting significant coding potential. (But overall coding potential is below threshold.) List putative eukaryotic splice sites List regions between putative acceptor/donor sites exhibiting significant coding potential. Write protein translations of ORFs OR Write protein translations of regions of interest OR Write protein translations of exons Prints a list of protein translations of ORFs, regions of interest or exons in FASTA format. Write nucleotide transcripts of ORFs OR Write nucleotide transcripts of regions of interest OR Write nucleotide transcripts of exonss Prints a list of nucleotide transcripts of ORFs, regions of interest or exons in FASTA format. Definitions Note: Definitions given here reflect how these words and abbreviations are used in the context of GeneMark and may differ from the "other" conventional meanings. Coding sequence (CDS): an open reading frame that codes for a protein Open reading frame (ORF): a region between a start codon and the next in frame stop codon; Prokaryotic coding sequences are ORFs, but an ORF does not have to be a coding sequence. Region of interest (ROI): a region between two stop codons in the same reading frame with a significant coding potential; such regions may indicate coding regions where start or stop codon have been masked by errors in the sequence. |
| Contact Us | Home |