User guide

CasLocusAnno provides three ways to submit data: (1) submit all protein sequences of a chromosome; (2) submit a nucleotide sequence of a chromosome. (3) submit a RefSeq accession number directly. The flowing figure demonstrates the running pipeline

Submit all protein sequences of a chromosome

Some nucleotide sequences or protein sequences are usually stored as FASTA format, which is a text-based format. In this format nucleotides are represented by five letters (A, T, G, C, U), and amino acids are represented using 20 letters. In FASTA format, ">" symbol should be located in the start and followed by sequence id and sequence comments in the first line. You should submit protein sequences of a chromosome in a FASTA format. Users should be noticed that PROTEIN SEQUENCE SUBMISSION SHOULD BE whole protein sequences in a chromosome. Less than 30 protein sequences are not allowed.

Submit a whole genome nucleotide sequence

CasLocusAnno can accept a whole genome nucleotide sequence of a chromosome, and then can call ZCURVE 3.0 program to identify all the protein coding genes, then detecting Cas protein, Cas locus and (sub)type. Sequence length less than 10000 is not allowed.

Submit a RefSeq accession number

RefSeq accession number should be has a prefix of AC, NC, or NG. RefSeq accession number and their molecule type have a detailed description in chapter 18 of The NCBI Handbook. The entered ID should be satisfied with the format description in chapter 18 of The NCBI Handbook. CasLocusAnno uses biopython to access NCBI’s databases and download all protein sequences of a chromosome according to RefSeq accession number provided by users. Therefore, users should be noticed that the efficiency of annotation has a strong association with the speed of interaction between our server and NCBI. If it too slow, please consider the first way. You should download all protein sequences of a chromosome and submit your data by the first method.

Server architecture

Panel A is data submitted and parameters setting page; Panel B is result page. In this page Cas locus and Cas proteins belonging to it can be listed. The number in the first column of panel B is position of the protein in user’s submitted chromosome. Number 1 with red color in B represents that the protein is a fused protein because two different Cas profiles match one protein. Detailed information will be displayed in panel C after clicking Cas id in B;