Yum, beautiful regulatory variants to spot...



Input Output Synopsis single variant documentation Contact
Examples & Tutorial

Analysing a VCF file


VCF file

The gateway to RegulationSpotter is our Query Engine. Here, you can upload any standard VCF file for analysis with RegulationSpotter. Note that the chromosomal positions have to relate to GRCh37. For a tutorial on how to use the Query engine please click here.

Analysis settings

In our QueryEngine interface, you can determine the following properties:

Homozygous Only

Tick this box if you want to only consider homozygous variants in your analysis.

Filter Polymorphisms

Enter a threshold for discarding variants as polymorphisms using the frequency observed in 1000G and ExAC. The default filtering is 4 for homozygous in 1000G and 10 for homozygous in ExAC. It is also possible to filter for variants present in any form (heterozygous and homozygous) as defined in the second row in case of non-recessive traits. Set both values to zero if you do not wish any filtering. All values refer to the number of individuals with the specific allele setup.

Minimum Coverage

Enter the minimum value for your variant's coverage. The default is 4.

Analyse the following regions or genes

Select whether you wish to analyse the entire VCF file, or custom regions or genes.

Synopsis and display settings

This page will appear after RegulationSpotter is done with its analysis. You will find a synopsis of your analysis here as well as the option to filter and sort your results.

Project ID

The project ID allocated by RegulationSpotter. If you want to easily access your results later on, you can note it now. Please don't alter or delete it as RegulationSpotter needs it!


Most often, RegulationSpotter will not analyse each and every line of your VCF file, either because you have set certain filters, or because certain variants were not suitable for analysis with RegulationSpotter. The synopsis gives you an idea of how your analysis went.

Submitted Variants

Number of alterations (lines) in VCF file.

Discarded before analysis

Number of variants which were filtered out according to user input (below coverage, not homozygous, out of specified region / chromosome) or due to input / format errors (e.g. variant equals refseq, reference allele equals alternative allele, Indel is too long or neither genotype nor frequency are supplied).

All Analysed Variants

Number of variants which were suitable for analysis. These can be significantly more than the lines in the VCF, because sometimes one line in the VCF contains more than one alternative allele. Additionally, if you choose to combine neighbouring variants, the number will rise even more.

Variants mapped to a gene and analysed ith MT

Number of analyses which were done by MutationTaster. These will normally be significantly more than the analysable variants, because for most variants more than one (suitable) transcript will be found.


The type of prediction algorithm used by MutationTaster. without_aae: our algorithm for silent alterations. simple_aae: Our algorithm for simple amino acid exchanges.


Gives an overview of the predictions generated by MutationTaster. Only applies to intragenic cases. The four options are:

More information on the classifications can be found in MutationTaster's documentation.

Analysis options

The analysis options section displays the options you chose in the last step.

Display settings

To make your analyses as convenient as possible, RegulationSpotter offers great flexibility in the results display.

Display/Filter Options

By default, extratranscriptic variants without annotation are hidden. Uncheck the checkbox to change this but beware of being flooded by your results. We warned you.

To make your search results easier to handle, there are several ways to restrict your display to certain genes or regions, to hide known polymorphisms and so on - just play around with the settings if you want.

Sort & Group Results

This section allows you to sort your results by various properties. By default, they are displayed by effect first, followed by position and effect.

Finally, just click on the big 'display' button to get a look at your results.

Output: Overview

here should be the table of results

Screenshot of the results overview output of RegulationSpotter.

Results table - data

Upon displaying your results, RegulationSpotter gives you a summary table of your results. Here, you can find each variant together with crucial information such as the gene it is associated with, the type of alteration etc.
This table serves to give you a quick graphical overview on each variant and its effect. Affected regulatory features are indicated in a colour-coded fashion. For non-dichotonous data: The stronger a colour is, the more affected a feature might be.

Chr, Pos, Ref, Alt

Information on the location and nature of the alteration.


Likely effect of the alteration. Depending on whether the variant is considered to be intragenic or extragenic, the options are:

intragenic variants

disease causing (ClinVar): known disease mutation listed in ClinVar.
disease causing: predicted by MutationTaster as disease causing.
polymorphism: predicted by MutationTaster as harmless.
polymorphism (automatic): known to be harmless from databases.

More information on the classifications can be found in MutationTaster's documentation.

extragenic variants

functional (high conf): Due to the available data, RegulationSpotter considers it vary likely for the variant to have a regulatory function. The high confidence label is given to variants for which the positive predictive value (PPV) was at least 98% and the negative predictive value (NPV) was below 98%.
functional: Due to the available data, RegulationSpotter considers it possible for the variant to have a regulatory function.
non-functional (high conf): Due to the available data, RegulationSpotter considers the variant to be not located in a regulatory region. For the high confidence label, the same thresholds as for functional variants were used.

Variant Frequency

Information on the availability of the variant in genetic frequency databases (dbSNP, 1000G)

Results table - colour-coded matrix

The second part of the results table is displayed in a colour-coded matrix. For various properties, each column gives an indication on the severity of the alteration and on its likelihood to be located in a regulatory region. Less transparency signifies a higher indication for a regulatory function/functional impact.

Most Severe Result

Most severe RegulationSpotter result for all available transcripts, will be used for sorting.


RegulationSpotter score for the variant. The score integrates all found evidence for the functionality of the variant. Higher values indicate a higher probability of functionality. To place the score of a specific variant within the range of possible scores and for a description of the calculation of the score please see our statistics section.


Type of alteration: Single nucleotide variant (SNV), Insertion/Deletion (InDel). InDels can be long (>10 bp) or short.

Intragenic Variant

Indicates whether the variant is located within a gene.

NMD / PTC / frameshift /truncated

Indicates whether the variant is a highly deleterious one, e.g. leading to nonsense-mediated decay (NMD), premature termination codon (PTC), frameshift or truncation

Amino Acid Substitution(s)

Displays whether an amino acid exchange occurs.

Within Protein Domains

Indicates whether the variant is located within a protein domain.

Altered Splicing

Indicates whether the variant leads to the alteration of a splice site.

Kozak Sequence Altered

Indicates whether the variant leads to the alteration of a Kozak sequence.

PolyA Signal Changed/p>

Indicates whether the variant leads to the alteration of a poly-A signal.

miRNA Binding Site

Indicates whether the variant leads to the alteration of a miRNA binding site.

Open Chromatin

Indicates whether the variant is located within an open chromatin section.


Indicates whether an enhancer is annotated for the variant.


Indicates whether the variant is located within a promoter. Active signifies that in addition, H3K4me3 and DHS annotations are also available for the site (please refer to our detailed page documentation for more information)


Indicates whether the variant is located within a H3K4me3 positive region indicative for active transcription. Robust indicates that these annotations were available for at least three different cell lines.

Histone Modifications

Histone modification annotations from Ensembl at the site of interest.

Within TFBS

Indicates whether the variant is located within a transcription factor binding site (TFBS) from Ensembl [2].


Polymerase II and III binding sites annotated for the region.

Genomic Interaction(s)

Indicates whether the variant is located within a genomic interaction site according to Rao et al [1]

PhyloP /PhastCons (max)

Indicates the highest PhyloP [3] and PhastCons [4] scores, respectively.

CADD (Scaled)

Indicates the scaled CADD score for the alteration.

Variant Frequency

Variant frequency in dbSNP and 1000G. Unknown/rare alleles are marked with a bright red colour.

Output: detailed

Clicking on the blue "extragenic results" oder "intragenic results" link of a variant leads you to more detailed insight into the results for a single variant.

For intragenic alterations and known disease causing variants, you will be redirected you to our conventional MutationTaster output. More information can be found in the MutationTaster documentation.

For the detailed explanation of an extragenic result please visit the single query documentation, where you can also find an explanation of the interaction plot.


In case you discover bugs, have suggestions or questions, please write an e-mail to
Jana Marie Schwarz (jana-marie.schwarz AT charite.de) or to
Dominik Seelow
(dominik.seelow AT charite.de).
We also appreciate hearing about your general experiences using RegulationSpotter.


[1] Rao SS, Huntley MH, Durand NC, Stamenova EK et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014. PMID: 25497547

[2] Zerbino DR, Wilder SP, Johnson N, Huettemann T, Flicek PR. The Ensembl Regulatory Build. Genome Biology 2015. PMID: 25887522

[3] Pollard KS, Hubisz MJ, Siepel A. Detection of non-neutral substitution rates on mammalian phylogenies. Genome Res. 2009. PMID: 19858363

[4] Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005. PMID: 16024819