Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes
Novel prokaryotic genomic sequences can be analyzed by the self-training software tool GeneMarkS-2 (sequences longer than 50 Kb). For some species pre-trained model parameters are ready and available through the GeneMark.hmm. Metagenomic sequences and individual short sequences (sequences < 50 kb) can be analyzed by MetaGeneMark.
Gene Prediction in Eukaryotes
Novel eukaryotic genomes can be analyzed by the self-training GeneMark-ES. The fungal mode of GeneMark-ES accounts for fungal-specific intron organization. GeneMark-ET integrates into GeneMark-ES information on mapped RNA-Seq reads. GeneMark-EP+ integrates into GeneMark-ES information on cross-species protein sequences. GeneMark-ETP integrates into GeneMark-ES both types of external information, RNA reads and cross-species proteins.
Gene Prediction in Transcripts
Sets of eukaryotic transcripts can be analyzed by GeneMarkS-T.
Gene Prediction in Viruses, Phages and Plasmids
Sequences of viruses, phages or plasmids can be analyzed either by GeneMarkS (sequences > 50 Kb) or MetaGeneMark (sequences < 50 kb)
All the software tools mentioned here are available for download.
The GeneMark software is a part of genome annotation pipelines at NIH NCBI (for prokaryotes) and DOE JGI (for eukaryotes) as well as others:
QUAST: assessment of genome assembly quality - uses GeneMarkS
MetAMOS: a tool for metagenome assembly and analysis - uses MetaGeneMark
Eukaryotic genome annotation pipelines:
MAKER2: uses GeneMark-ES along with SNAP and AUGUSTUS.
BRAKER1: integrates RNA-Seq reads -- uses GeneMark-ET and AUGUSTUS
BRAKER2: integrates known proteins -- uses GeneMark-EP+ and AUGUSTUS
BRAKER3: integrates RNA-seq reads and known proteins -- uses GeneMark-EP+ and AUGUSTUS