Filtering for variant quality scores

 

SVA provides a number of filter tools, including a quality score (confidence score) filter for each type of variants (see figure below).

There are two strategies in filtering for variant quality score:

Strategy 1. A variant-wise filtering based on average scores across individual genomes. This filtering strategy is developed for computational efficiency purpose and for the consideration that for rare variants, the supporting evidence may be from different individual genomes. This is the default filtering strategy applied in the SVA summarization and analysis routines. In the figure above, all the three menu items starting with "Confidence score filter (avg)" belong to this strategy.

This is also a more 'forgiving' filtering strategy.

Strategy 2. An individual-variant-wise filtering based on the scores separately for indiviual genomes. This is the conventional and strict filtering strategy. In the figure above, the menu item of "Individual variant confidence score filtering and exporting" belongs to this strategy.

Please note: for this filtering strategy, you will be asked to export the filtered dataset to a new project. It has to be a new project mainly due to computational efficiency considerations. You can then work with the new project, without worrying for qualities of variants.

1. Set the quality filtering criteria

The quality filtering criteria can be set through the strategy 1 menu items (figure above) or buttons (figure below). The menu items and the buttons do exactly the same things.

I will use the quality filtering for single nucleotide variants (SNVs) as an example of this process. Click on either the menu item or the button, you will see a window like this:

In this example for SNVs, we have six different scores to potentially use as quality filter: (Phred-like) concensus score, SNP quality, RMS score, read depth, reads supporting SNV, reads supporting the reference base.

We used these thresholds for SNVs:

  1. Consensus score>=20
  2. SNP quality >=20
  3. Reads supporting SNV>=3
  4. Half of these thresholds for sex chromsomes (since our sample genomes were males)

I listed these thresholds here for your reference. You can certainly consider and set your confidence score thresholds.

After you click on the "FILTER" button, the SNV, INDEL, and CV listing tables will be automatically updated. A status bar at the bottom of the window will update you how many in total and how many remain after the filtering (figure below). In this example below, the status bar indicates the "Score " (filtering) is "on ". And the number of filtered count is 333789 - equal to the total. This is because in the chromosome-X example dataset released with SVA, the dataset has been individually filtered with strategy 2.

The strategy 1 filter will apply to most of the variant analysis and summarization procedures in SVA. In fact when you perform any variant analysis and summarization procedures, please pay attention to the output printed onto the "Log" panel: . Any filters applied in the procedures will be printed there - and if you do not see a specific filter is "on", then that filter is not applied.

When you load the project, you also have an option to automatically apply a default quality filter (figure below). If you dislike the default one, you can simply reset it and apply a new one after the project is loaded.

Lastly, please note that the thresholds set here will also be applied to strategy 2, if you want to individually filter the variants and export to a new project - i.e., although the figure above does indicate "average scores", the thresholds are actually the same for strategy 2 - but the filtering process will be different.

2. Perform strategy 2 filtering

Click on menu "Filter -> Individual variant confidence score filtering and exporting", you will be asked to export the filtered dataset to a new project.

3. reset the quality filtering criteria

You may reset the quality filter (with all other filters) by clicking on menu "Filter -> Reset all filters".

 

| Visits: Locations of visitors to this page   |
© 2011 Dongliang Ge, PhD.