This page gives clear instructions on how to use CEG 2.0

Bacteria page
The bacteria page of the database preferentially displays CEG ID, Gene names(s), ESAHP, KO Cluster, COG Cluster, Cluster size, strains, EC, Drug Size, Virulence size, Struct size and Pathway size for each record. Upon clicking any CEG ID, detailed information about each cluster of essential genes is displayed in another window. On this page, users can obtain more detailed information including CEG ID adding gi, Gene name(s), Function, Pathway, Struct, Drug, ESAHP, Organism, Virulence, Protein-Ligand. Especially, we provided links for user to rank the values in ascending order or decending order by selecting corresponding items after clicking arrows .(For example, one can rank cluster size in decending order by selecting clustersize_dec.) Such links also can be found on eukaryotes page and human page.

Eukaryotes page
The eukaryotes page of the database preferentially displays CEG ID, Gene names(s), Description, OrthoDB Cluster, Cluster size and Species size for each record. Upon clicking any CEG ID, detailed information about each cluster of essential genes is displayed in another window. On this page, users can obtain more detailed information including CEG ID adding DEG ID, Gene name(s), Function, Condition, UniProt and Organism.

Human page
The human page of the database preferentially displays CEG ID, Gene names(s), Description, OrthoDB Cluster, Cluster size, Cell line size for each record. Upon clicking any CEG ID, detailed information about each cluster of essential genes is displayed in another window. On this page, users can obtain more detailed information including CEG ID adding DEG ID, Gene name(s), Function, Condition, UniProt and Cell line.

Search page
On the search page, users can search for cluster of essential genes of interest using the CEG ID, COG ID, EOG ID, Gene Name or Cluster Size. Prokaryotic and eukaryotic and human essential genes can be searched seperately.

Blast page
The BLAST program was integrated into the CEG database, which can help researchers discover potential cluster of essential genes based on input nucleic acids or proteins sequences of interest via alignment with sequences in the database. User can do blast search by pasting sequence(s) in fasta format and select blastn or blastp accordingly. Prokaryotic and eukaryotic sequences can be blasted seperately, and we provide all services facing them. BLAST results that have higher bit-scores will be preferentially displayed.

Predict page
The predict page enable user to predict whether gene or sequence(nucleic acid or protein) is essential or not by CEG_match developed by us. Those with essentiality higher than k value selected by user will be considered as essential. User can select input type such as gene or sequence, and also can choose their output type such as gene or COG/EOG. Two pages( Predict(prokaryotes) page and Predict(eukaryotes) page) are developed to predict essentiality of gene or sequences of prokaryotes and eukaryotes respectively. User can select according to their requirements.

Detailed information about cluster of essential genes
Item description listed in CEG 2.0

Item Description
CEG ID A unique ID for each cluster of essential genes
CEG_gene ID A unique ID for each essential gene of a cluster
Gene name(s) The name of one essential gene
Function and Pathway The function one essential gene showed and the pathway it participated. Pathway size represents numbers of pathways participated in by gene.
Struct and Drug The structure of protein coded by one essential gene and the best drug target towards the protein. Struct size and drug size are numbers of structures and numbers of genes as potential targets of drugs.
ESAHP The best value of alignments between all genes in one cluster and all sequences in HPRD(The Human Protein Reference Database)
Virulence/Virulence size The genes marked as potential in that column have potential relationship with virulence factor, while genes marked as Determine have been determined to have virulence. Virulence size represents the quantity of genes considered as potential virulence factors.
Protein-Ligand This item provide the information of receptor and ligand and their interaction stored in BioLiP.
Uniprot Uniprot ID of the essential gene.
Protein sequence Amino acid sequence of essential gene
KO Cluster The KEGG orthology
Cluster size The number of essential genes in one cluster
Strains size The number of strains in one cluster
EC Enzyme in KEGG
Gene Sequence Gene sequence extracted according to Coding region in NCBI
VFDB id VFDB ID in VFDB database

Home page
The home page provides a basic introduction to CEG and lists some useful links.

Link Description
CEFG Group of Computational, Comparative,Evolutionary and Functional Genomics at UESTC
DEG DEG hosts records of currently available essential genomic elements, such as protein-coding genes and non-coding RNAs, among bacteria, archaea and eukaryotes.
COGs COGs is a database containing Clusters of Orthologous Groups of proteins
Human Protein Reference Database The Human Protein Reference Database(HPRD) represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome. ESAHP represents the best value of alignment between all genes in a cluster and all sequences in HPRD.
KEGG KEGG contain various data objects for computer representation of the biological systems, whose base entry is called the KEGG object, which is identified by the KEGG object identifier consisting of a database-dependent prefix and a five-digit number.
RSCB PDB RCSB PDB (Research Collaboratory for Structural Bioinformatics PDB) operates the US data center for the global PDB archive, which provides access to 3D structure data for large biological molecules (proteins, DNA, and RNA).
DrugBank A comprehensive database containing information on drugs and drug targets and combining detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information.
BioLiP database BioLiP database is constructed using known protein structures in PDB and provide a couple including receptor and ligand combining together.
Virulence factor database (VFDB) The virulence factor database (VFDB) is an integrated and comprehensive online resource for curating information about virulence factors of bacterial pathogens.
UniProt The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data.

Feedback
If you have any comments, or queries about the CEG 2.0 database, please contact us at < guolab_whu@163.com >
Web: http://guolab.whu.edu.cn , Bioinformatics Center in School of Life Science and Technology, University of Electronic Science and Technology of China. No.4, Section 2, North Jianshe Road, Chengdu 610054, China.