show me some regulation...

Regul@tionSpotter

Documentation

Documentation
Input Output Synopsis Contact
Examples
Examples & Tutorial

Input

VCF file

The gateway to RegulationSpotter is our Query Engine. Here, you can upload any standard VCF file for analysis with RegulationSpotter. Note that the chromosomal positions have to relate to GRCh37.

Analysis settings

In our QueryEngine interface, you can determine the following properties:

SEARCH FOR HOMOZYGOUS VARIANTS

Tick this box if you want to only consider homozygous variants in your analysis.

FILTER AGAINST 1000G

Enter a threshold for viltering variants out. The default is 4 for homozygous and 20 for heterozygous cases:

MINIMUM COVERAGE

Enter the minimum value for your variant's coverage. The default is 4.

ANALYSE THE FOLLOWING REGIONS OR GENES

Select whether you wish to analyse the entire VCF file, or custom regions or genes.

Synopsis and display settings

This page will appear after RegulationSpotter is done with its analysis. You will find a synopsis of your analysis here as well as the option to filter and sort your results.

PROJECT ID

The project ID allocated by RegulationSpotter. If you want to easily access your results later on, you can note it now. Please don't alter or delete it as RegulationSpotter needs it!

Synopsis

Most often, RegulationSpotter will not analyse each and every line of your VCF file, either because you have set certain filters, or because certain variants were not suitable for analysis with RegulationSpotter. The synopsis gives you an idea of how your analysis went.

SUBMITTED VARIANTS

Number of alterations (lines) in VCF file.

PRE-DISCARDED VARIANTS

Number of variants which were filtered out according to user input (below coverage, not homozygous, out of specified region / chromosome) or due to input / format errors (e.g. variant equals refseq, reference allele equals alternative allele, Indel is too long or neither genotype nor frequency are supplied).

ANALYSABLE ALTERATIONS

Number of variants which were suitable for analysis. These can be significantly more than the lines in the VCF, because sometimes one line in the VCF contains more than one alternative allele. Additionally, if you choose to combine neighbouring variants, the number will rise even more.

DISCARDED (TGP)

Number of variants ignored for analysis due to presence in 1000 Genomes Project (applies only if one or both of the two filter against TGP options are set).

EXTRAGENIC VARIANTS

Total number of variants which were analysed with RegulationSpotter.

INTRAGENIC VARIANTS ANALYSED WITH MT

Number of analysable variants which were intragenic and thus analysed with MutationTaster.

MT CASES

Number of variants which were analysed with MutationTaster. These will normally be significantly more than the analysable variants, because for most variants more than one (suitable) transcript will be found.

TYPE OF AMINO ACID EXCHANGE (AAE)

Gives the number of observed amino acid exchanges of one of the three types used in MutationTaster. One type is for alterations that do not cause any amino acid exchange (without_aee), one for simple substitutions (simple_aae), and one for those changes that cause more complex changes in the aa sequence of the resulting peptide, such as a frameshift or a shifted start ATG (complex_aae).

PREDICTION

Gives an overview of the predictions generated by MutationTaster. Only applies to intragenic cases. The four options are:

More information on the classifications can be found in MutationTaster's documentation.

Analysis options

For your convenience, we display here the options you chose in the last step.

Display settings

To make your analyses as convenient as possible, RegulationSpotter offers great flexibility in the results display. Finally, you can decide whether you would like to download or display your results.

Output: Overview

The different elements of the output are named and described below.

Results table - data

Upon displaying your results, RegulationSpotter gives you a summary table of your results. Here, you can find each variant together with crucial information suche as the gene it is associated with, the type of alteration etc.
This table serves to give you a quick graphical overview on each variant and its effect. Affected regulatory features are indicated in a colour-coded fashion. The stronger a colour is, the more affected a feature might be.

CHR, POSITION, REF, ALT

Information on the location and nature of the alteration.

RESULT

Likely effect of the alteration. Depending on whether the variant is considered to be intragenic or extragenic, the options are:

intragenic variants

disease causing (ClinVar): known disease mutation listed in ClinVar.
disease causing: predicted by MutationTaster as disease causing
polymorphism: predicted by MutationTaster as harmless.

extragenic variants

likely regulatory variant: Due to the available data, RegulationSpotter considers it likely for the variant to have a regulatory function.
possible regulatory variant: Due to the available data, RegulationSpotter considers it possible for the variant to have a regulatory function.
polymorphism: Due to the available data, RegulationSpotter considers the variant to be not located in a regulatory region.

VARIANT FREQUENCY

Information on the availability of the variant in genetic frequency databases (dbSNP, 1000G)

Results table - colour-coded matrix

The second part of the results table is displayed in a colour-coded matrix. For various properties, each column gives an indication on the severity of the alteration and on its likelihood to be located in a regulatory region. Less transparency signifies a higher indication for a regulatory function/functional impact.

MOST SEVERE RESULT

Most severe RegulationSpotter result for all available transcripts, will be used for sorting.

XSCORE

RegulationSpotter score for the variant

TYPE

Type of alteration: Single nucleotide variant (SNV), Insertion/Deletion (InDel). InDels can be long (>10 bp) or short.

INTRAGENIC VARIANT

RegulationSpotter score for the variant

NMD / PTC / frameshift /truncated

Indicates whether the variant is a highly deleterious one, e.g. leading to nonsense-mediated decay (NMD), premature termination codon (PTC), frameshift or truncation

AMINO ACID SUBSTITUTION(S)

Displays whether an amino acid exchange occurs.

WITHIN PROTEIN DOMAINS

Indicates whether the variant is located within a protein domain.

ALTERED SPLICING

Indicates whether the variant leads to the alteration of a splice site.

KOZAK SEQUENCE ALTERED

Indicates whether the variant leads to the alteration of a Kozak sequence.

POLY-A SIGNAL CHANGED

Indicates whether the variant leads to the alteration of a poly-A signal.

miRNA BINDING SITE

Indicates whether the variant leads to the alteration of a miRNA binding site.

FANTOM5/VISTA

Indicates whether the variant is located within a site listed in FANTOM5/VISTA data.

MULTICELL REGULATORY FEATURE

Indicates whether the variant is located within a multicell regulatory feature from Ensembl.

WITHIN PROMOTER

Indicates whether the variant is located within a promoter (-500bp/+50bp around a TSS).

H3K4me3 POSITIVE

Indicates whether the variant is located within a H3K4me3 positive region indicative for active transcription.

DNAse1 HYPERSENSITIVE SITE

Indicates whether the variant is located within a DNAse1 hypersensitive site indicative for active transcription (ENCODE/Ensembl).

WITHIN TFBS

Indicates whether the variant is located within a transcription factor binding site (TFBS) from Ensembl.

GENOMIC INTERATION

Indicates whether the variant is located within a genomic interaction site according to Rao et al [1]

PHYLOP /PHASTCONS MAX

Indicates the highest PhyloP and PhastCons scores, respectively.

CADD (SCALED)

Indicates the scaled CADD score for the alteration.

VARIANT FREQUENCY

Indicates the variant frequency in dbSNP and 1000G. Unknown/rare alleles are marked with a bright red colour.

Output: detailed

Clicking on the variants leads you to more detailed insight into your results. For intragenic alterations and known disease causing variants, you will be redirected you to our conventional MutationTaster output. More information can be found in the MutationTaster documentation.

Result

Likely effect of an alteration. RegulationSpotter treats alterations differently depending on whether they are located within a gene or not. For intragenic alterations, it relies on MutationTaster, which classifies an alteration as one of four possible types:

disease causing (ClinVar): known disease mutation listed in ClinVar.
disease causing: predicted by MutationTaster as disease causing
polymorphism: predicted by MutationTaster as harmless.

For more details about the classification process, please refer to our MutationTaster documentation.

Extragenic alterations are assessed by RegulationSpotter directly. The program compiles and combines all the regulatory data and comes up with an estimate of how likely it is for a variant to be located in a regulatory region. The three possible outcomes are:

likely functional: Regulatory information is available for the variant's location in several data sources. Thus, RegulationSpotter considers the likelihood for a regulatory function to be quite high.
possibly functional: Regulatory information is available for the variant's location in at least one data source.
polymorphism: No regulatory information is available. Thus, RegulationSpotter assumes that the variant is not located in a regulatory region.

Alteration (phys. location)

The alteration on "physical" i.e. chromosomal level (e.g. chr7:91623937_91623938insGGCAAT).

Alteration type

Is either a base exchange, a combination of insertion and deletion, an insertion or a deletion.

Alteration region

Extragenic by definition.

Known variant

Any known polymorphism(s) or known disease variant that have been found at the position in question. Our database contains all single nucleotide polymorphisms (SNPs) from the NCBI SNP database (dbSNP). Moreover, we have stored all HapMap genotype frequencies as well as variants from the 1000 Genomes Project [3] (TGP). If an alteration is located at the same position as a known dbSNP, MutationTaster provides the SNP ID (or rs ID) and a link together with the HapMap genotype frequencies, if available. If every of the three possible genotypes is observed in at least one HapMap population, the alteration is automatically regarded as a polymorphism and predicted as polymorphism automatic. Please note that there may be differences between your alteration and the alleles in dbSNP.
For TGP, MutationTaster provides information in either of the following formats: If an alteration was found more than 4 times homozygously in TGP, it is automatically regarded as polymorphism. We also display known disease variants from dbSNP ClinVar. If a variant is marked as probable-pathogenic or pathogenic in ClinVar, it is automatically predicted to be disease-causing, i.e. disease causing automatic (the naive Bayes classifier is run nevertheless and the p value for the prediction is shown). Moreover, we have integrated the public version of the Human Gene Mutation Database (HGMD) [5]. The data includes the positions of the disease mutations and their HGMD ID. The disease alleles are not included so we cannot use HGMD for automatic predictions. Whenever an HGMD public disease mutation is found at the same position as a variant, this will be written in the summary. We also place a direct hyperlink to the mutation in HGMD into the 'dbSNP / TGP / HGMD(public) / ClinVar' field, so you can check whether the HGMD mutation has the same allele as your variant (and whether the disease matches). Please note that you must be logged in at the HGMD site to make the hyperlink work - access to the public version is free but requires registration.

ENSEMBL multicell regulatory features

Indicates whether the alteration is located within an ENSEMBL multicell regulatory feature.

Regulatory features from VISTA and FANTOM5

Regulatory data from Ensembl Regulatory build, b37, pblished in [2] (FANTOM5) and [4] (VISTA).

TarBase miRNA binding sites

MicroRNA binding sites which are affected by the alteration as annotated in DIANA TarBase [2;5].
A link to Diana TarBase with more information to the respective microRNA is provided.

PhyloP/PhastCons

Indicates the conservation of the alteration site. Data from phyloP [6] and PhastCons [7].
PhastCons and phyloP are both methods to determine the grade of conservation of a given nucleotide. MutationTaster uses values which are precomputed and offered by UCSC (please follow the links to phyloP and PhastCons).
phastCons values vary between 0 and 1 and reflect the probability that each nucleotide belongs to a conserved element, based on the multiple alignment of genome sequences of 46 different species (the closer the value is to 1, the more probably the nucleotide is conserved). It considers not just each individual alignment column, but also its flanking columns.
In contrast, phyloP (values between -14 and +6) separately measures conservation at individual columns, ignoring the effects of their neighbors. Moreover, phyloP can not only measure conservation (slower evolution than expected under neutral drift) but also acceleration (faster evolution than expected). Sites predicted to be conserved are assigned positive scores, while sites predicted to be fast-evolving are assigned negative scores.
For more information about phyloP and phastCons, please see the cited papers.

Chromosome

The chromosome the alteration is located on.

Strand

Is either 1 for forward strand or -1 for reverse strand

Chromosomal position

Gives the last wild-type base before alteration and first wild-type base after alteration in chromosomal sequence context (position relative to start of chromosomal reference sequence) e.g. 154,372,337 / 154,372,339, the altered base is at position 154,372,338.

Original chrDNA sequence snippet

Original DNA sequence with the original nucleotide marked in blue.

Altered chrDNA sequence snippet

Altered DNA sequence with the original nucleotide marked in blue.

Contact

In case you discover bugs, have suggestions or questions, please write an e-mail to
Jana Marie Schwarz (jana-marie.schwarz AT charite.de) or to
Dominik Seelow
(dominik.seelow AT charite.de).
We also appreciate hearing about your general experiences using MutationTaster.

References

[1] Rao SS, Huntley MH, Durand NC, Stamenova EK et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014. PMID: 25497547

[2] Zerbino DR, Wilder SP, Johnson N, Huettemann T, Flicek PR. The Ensembl Regulatory Build. Genome Biology 2015. PMID: 25887522

[3] 1000 Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature 2012 Nov 1. PMID: 23128226

[4] Visel A, Minovitsky S, Dubchak I, Pennacchio LA. VISTA Enhancer Browser - a database of tissue-specific human enhancers. Nucleic Acids Res. 2007. PMID: 17130149

[5] Vlachos IS, Paraskevopoulou MD, Karagkouni D et al. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucleic Acids Res. 2014. PMID: 25416803

[6] Pollard KS, Hubisz MJ, Siepel A. Detection of non-neutral substitution rates on mammalian phylogenies. Genome Res. 2009. PMID: 19858363

[7] Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005. PMID: 16024819