SVA project

Home
Homepage Wikipedia SEQwiki
Introduction
Background How it works Proof of concept References
Downloads
Downloading SVA Terms of use
Example
Example project A fast test run for impatient users
User Guide
Index System requirement Installation Memory configuration, reducing memory usage Updating annotation databases Main user interface Data inputs Creating a project Annotating a project Using user annotation track [GFF3/BED] Analyzing a project Selecting genes or regions Exome or targeted capture sequencing A command line tool Version history Updating to the latest version FAQ Requests & discussions License
Screenshot
Screenshot list Screenshot 1 Screenshot 2 Screenshot 3 Screenshot 4 Screenshot 5 Screenshot 6 Screenshot 7 Screenshot 8 Screenshot 9 Screenshot 10 Screenshot 11 Screenshot 12 Screenshot 13 Screenshot 14 Screenshot on a Dell laptop with 32-bit Win XP
Java Dev
Javadoc for Java developers [GUI] Javadoc for Java developers [Command line tool]
Plug-ins
Plug-ins Command line tool
Visitors
Visitor statistics Leave a comment

Installation

Using this software

0. For impatient users
1. Data inputs

Coordinate system
2. Create a project
3. Annotate a project

Functional categories

Using user annotation track
4. Filter for quality scores
5. Main user interface
6. SVA genome browser

Sections overview

Identified variants

Public data
7. SVA tables

SNV

INDEL

SV
8. Selecting genes or regions

Genomic region
9. Analysis

Gene prioritization

Variant prioritization

Fisher's exact test
10. Exome or targeted capture sequencing

Requests and discussions

Data inputs

This document describes the input format required by SVA version 1.1 and onwards. For older versions of SVA, where the pileup format was supported, see here.

SVA users need to prepare three (3) types of input files for an SVA project.

These 3 types of files are:

A list of identified variants including single nucleotide variants (SNVs) insertion/deletion s(INDELs) and in a specific vcf format (the detailed description of the vcf format can be found here) - text file with file name extension .vcf;
(Optional) A list of structural variations (SVs) in HMMCNV output format - text file with file name extension .events;
A chromosome-wise coverage and quality control data file, generated from SAMtools mpileup output.- binary file with file name extension .bco

In addition, there is an optional pedinf file for an SVA project. This file lists the subjects in a linkage format. This file is not necessary for SVA annotation tasks, but is necessary for some SVA analysis and exporting functions.

Optional pedinf file :

pedinf file: listing the subjects in a linkage format, consisting of six columns, seperated by space or tab:

Family ID, Individual ID, Father ID, Mother ID, Gender (1=male, 2=female), Affected status (1=control, 2=case, -9=unknown)

Here is an example for this file.

I will assume that the SVA users are already familiar with next-generation sequencing data pipelines, particularly using BWA/SAMtools. The file name extensions in the above box is only for SVA to conveniently recognize the relative format. Although we do ourselves use BWA/SAMtools, the file extensions do not indicate that SVA only takes outputs from SAMtools. SVA does not distinguish which software generates the alignment results, as long as the format is in accordance with the description below.

There is another important note:

Important note:

The build of the human reference genome that you use for alignment must be the same with the build that you use for SVA annotation. For instructions on how to update SVA annotation databases, see here.

The basic data generation flow described below is based on our experience for your reference. You may choose to use different parameter settings.

Step 1. Generating mpileup file

We used SAMtools to generate the mpileup file:

[YOUR SAMTOOLS DIR]/samtools mpileup -d 500 -uf [YOUR RERERENCE FASTA FILE] [YOUR ALIGNMENT .bam file] > [YOUR mpileup file]

There is an important note regarding the chromosome designatations, which will affect the following data generation.

Step 2. Generating variant file in vcf format

We used SAMtools/bcftools to generate the variant file (Please note this is a basic example. Your actual parameters may vary.):

[YOUR SAMTOOLS DIR]/bcftool/bcftools view -bvcg [YOUR mpileup file] > [YOUR bcf file]

[YOUR SAMTOOLS DIR]/bcftool/bcftools view [YOUR bcf file] > [YOUR raw vcf file]

[YOUR SAMTOOLS DIR]/bcftool/vcfutils.pl varFilter -D 500 [YOUR raw vcf file] > [YOUR filtered variant vcf file]

(Optional) Step 3. Generate SV file .events

We used a separate program (ERDS) to generate the SV file. Please refer to its webpages for user guide.

Here is an example of the generated .events file:

The columns are: chromosome name, start coordinate, end coordinate, SV status (diploid=2), LOD score.

Step 4. Generate coverage and quality score file .bco

We used a simple JAVA program vcf2bco.jar (download it here) to generate the chromosome-wise .bco file from base-wise vcf file generated using SAMtools/bcftools.

[YOUR SAMTOOLS DIR]/bcftool/bcftools view -gc [YOUR mpileup file] > [YOUR base-wise vcf file]

java -jar [YOUR vcf2bco.jar DIR]/vcf2bco.jar [YOUR base-wise vcf file] [YOUR_BCO_OUTPUTSTEM]

Note: This small JAVA program (vcf2bco.jar) accepts pileup file with chromosome designations (column 1) as an integer from 1-22, and X, Y, M. For example, vcf2bco accepts "16" but not "chr16".

The .bco is in binary format, using 4 bytes for each base with one byte for each score: consensus quality, SNP quality, RMS mapping quality, read depth. Please note in this process the upper limit for each score is 255. Any score greater than 255 will be trimmed.

After you generate these four types of files (with step 3 as optional), you may proceed to create your project.

Installation

Using this software

A command line tool

FAQ

Requests and discussions