Vgep: A Tool for Essential Genome Prediction of Virus

The theoretical foundation of Vgep is as follows:
Vgep combines two methods to make the determination.
(1) Orthology alignment: the first part is base on the Geptop2.0, which outputs the essential score by orthology aligning of inputed genes and assigning weights by phylogeny simiarity with essential gene datasets determined experimentally (from DEG Database), the score of each sequence depends on the complete set of input genes because of the normalization in this method.
(2) SVM classification model: k-mer frequency distribution is used to extract features from DNA or amino acid sequences, and the SVM model is used to implement binary classification. The probability of that the input sequence belongs to the essential gene will be outputed from the classifier as the score of the second method.
The results of the two methods are reassigned to the final output score of Vgep, whose value ranges from 0 to 1 with a cutoff as 0.5. The higher the value, the higher the probability that the input sequence is an essential gene.



You can input nucleotide sequences in Fasta format :

Or you can input amino sequences in Fasta format :

(If sequences are in your file, it must be in fasta format,)