Updating annotation databases

 

There are two different types of updates of SVA annotation databases:

(1) Main update: update of hyman reference genome. We will release new SVA package when new genome builld becomes available, and post it onto the SVA website as soon as we can. Users need to re-download the SVA package to effect this update. There are currently two different releases of SVA packages regarding the main updates: build 36 (Ensembl version 50_36l) and build 37 (Ensembl version 61_37f).

(2) Minor update: update of gene annotation databases within a given genome build. Users are suggested to directly download the annotation databases and put it into the relevant SVA directories, following the instructions below, to effect these updates. We will also maintain an update following the Ensembl releases in the standard SVA release. However, this update will not be made following each update of Ensembl core databases. Therefore, users are strongly encouraged to perform this update themselves when they desire.

Here is how to perform the minor update:

(2.1) Locate and download the Ensembl annotation databases.

Visit Ensembl's database downloading ftp site:

ftp://ftp.ensembl.org/pub/current/mysql/

Locate the directories for human databases:

homo_sapiens_variation_[ENSEMBL VERSION] (replace ENSEMBL VERSION with correct database versions, for example 61_37f)

homo_sapiens_core_[ENSEMBL VERSION]

ensembl_ontology_[ENSEMBL VERSION]

Users need to download the following data files from these directories. Please note you will need to extract the downloaded .gz files to flat text files.

#homo_sapiens_core
gene_stable_id.txt
transcript.txt
transcript_stable_id.txt
translation.txt
exon.txt
exon_transcript.txt
xref.txt
object_xref.txt
gene.txt
seq_region.txt
#ensembl_ontology
term.txt
#homo_sapiens_variation
variation.txt
variation_feature.001.txt
variation_feature.002.txt
allele.001.txt
allele.002.txt
sample.txt

If for "variation_feature" and "allele"there is only one file instead of two "001" and "002" files, rename the download file to end with ".001.txt" , and then create an empty file with the name ended with ".002.txt".

(2.2) Create a relevant SVA database directory

Locate your SVA installation directory. Then create your SVA annotation database directory. Typically, it should be at:

[YOUR SVA INSTALLATION DIRECTORY]/datasource/ensembl/[YOUR DATABASE VERSION]

Put all the downloaded files from step 1 into this directory.

(2.3) Use the updated annotation databases in your next SVA annotation job

In your SVA .gsap script file, you need to specify to use the newly updated data files to effect the update. Specifically, you need to update these statements:

#Core
[GENEID]=./datasource/ensembl/[YOUR DATABASE VERSION]/gene_stable_id.txt
[TRANSCRIPT]=./datasource/ensembl/[YOUR DATABASE VERSION]/transcript.txt
[TRANSCRIPTID]=./datasource/ensembl/[YOUR DATABASE VERSION]/transcript_stable_id.txt
[TRANSLATION]=./datasource/ensembl/[YOUR DATABASE VERSION]/translation.txt
[EXON]=./datasource/ensembl/[YOUR DATABASE VERSION]/exon.txt
[EXONTRANSCRIPT]=./datasource/ensembl/[YOUR DATABASE VERSION]/exon_transcript.txt
[XREF]=./datasource/ensembl/[YOUR DATABASE VERSION]/xref.txt
[OBJECTXREF]=./datasource/ensembl/[YOUR DATABASE VERSION]/object_xref.txt
[GENE]=./datasource/ensembl/[YOUR DATABASE VERSION]/gene.txt
[SEQREGION]=./datasource/ensembl/[YOUR DATABASE VERSION]/seq_region.txt
#GO
[GO]=./datasource/ensembl/[YOUR DATABASE VERSION]/term.txt
#Existing variations
[VARIATION]=./datasource/ensembl/[YOUR DATABASE VERSION]/variation.txt
[VARIATIONFEATURE1]=./datasource/ensembl/[YOUR DATABASE VERSION]/variation_feature.001.txt
[VARIATIONFEATURE2]=./datasource/ensembl/[YOUR DATABASE VERSION]/variation_feature.002.txt
[VARIATIONALLELE1]=./datasource/ensembl/[YOUR DATABASE VERSION]/allele.001.txt
[VARIATIONALLELE2]=./datasource/ensembl/[YOUR DATABASE VERSION]/allele.002.txt
[VARIATIONSAMPLE]=./datasource/ensembl/[YOUR DATABASE VERSION]/sample.txt

(2.4) Alternatively, use [USERTRACK] option to include user annotation track

You may want to include user annotation tracks in GFF3 or BED format to perform annotations in addition to the core annotation. To do this, include the following statements:

[USERTRACK]= [YOUR TRACK NAME] , [YOUR TRACK FILE PATH].gff3

[USERTRACK]= [YOUR TRACK NAME] , [YOUR TRACK FILE PATH].bed

If your gene annotation files are in GTF format, they can be conveniently converted into GFF3 format using this online tool:

http://www.sequenceontology.org/cgi-bin/converter.cgi

 

 

| Visits: Locations of visitors to this page   |
© 2011 Dongliang Ge, PhD.