Click on a header on the header panel above to get directed to the main sections of this documentation

To navigate through more specific sections of this documentation, scroll through the grey navigation panel present at the right side of this page and click on a header.

To get a quick idea of how you can select the most relevant results from ORVAL, click on the Tutorials header above.

Summary

ORVAL is the first web bioinformatics platform for the exploration of predicted candidate disease-causing variant combinations, aiming to aid in uncovering the causes of oligogenic diseases (i.e. diseases caused by variants in a small number of genes). This tool integrates innovative machine learning methods for combinatorial variant pathogenicity prediction, further external annotations and interactive and exploratory visualisation techniques.

What can you do with ORVAL?

SUBMIT AND FILTER YOUR VARIANTS

You can submit the variants of a single individual either as a variant list or a VCF file.
You can also filter your variants based on their Minor Allele Frequency (MAF), their position in the gene and/or based on a specific gene panel of your choice.

PREDICT CANDIDATE DISEASE-CAUSING VARIANT COMBINATIONS

With ORVAL you can predict candidate pathogenic variant combinations in any gene pair present in your data with VarCoPP and further predict their digenic effect (True Digenic, Monogenic with a Modifier variant or Dual Diagnosis) with the Digenic Effect Predictor.

EXPLORE POTENTIAL OLIGOGENIC SIGNATURES

You can investigate potential oligogenic disease signatures by exploring the interactive gene networks that are created based on the predictions and examine them in the context of their protein-protein interactions, cellular locations and pathways.


The input data


ORVAL accepts a list of variants from a single individual only, as it creates all possible variant combinations between pairs assuming that these belong to the same individual.

You can provide either Single Nucleotide Variants (SNVs) or small insertions/deletions (indels).

Types of input files


There are two different types of variant input that you can use to upload your data: either a variant list or a VCF file. After uploading your data, you can start the analysis by clicking on the button.

Tab-delimited variant list

tab delimited list example

At the left panel of the Submit variants page you can insert/copy-paste a variant list. Each line should contain tab- or space-delimited information for one variant, in the corresponding order: chromosome, position, reference allele, alternative allele, zygosity.

No headers are needed.

The zygosity values should be either Heterozygous or Homozygous. During the analysis, ORVAL automatically converts X-linked variants in males as Hemizygous.

You can also manually insert a variant in this list by typing information:

  • at the next line and making sure that you use the same delimiter for all columns
  • at the corresponding chr, position, reference allele, alternative allele, Zygosity column fields and pressing the button.

VCF file

VCF file example

Alternatively, you can submit a VCF file (version 4.2) with your variants at the right panel of the submission page.

ORVAL requires as minimum the presence of:

  • the #Header Line: #CHROM POS ID REF ALT etc... line

  • the columns CHROM, POS, ID, REF, ALT, FORMAT, SAMPLE_NAME (patient information column containing values corresponding to the FORMAT field).

  • the genotype (GT) field for each variant at the FORMAT and SAMPLE_NAME columns. In case variants with GT: 0/0 ; 0|0 ; ./. ; . are present, these are discarded from the analysis.

Any other meta-information lines on the top of the file or any extra columns and fields (e.g. QUAL, INFO, etc.) can be present, but ORVAL will ignore them.

In case you want to create your own VCF file, you can download and take a look at the example VCFs that are present at the VCF submission panel and/or consult the Samtools specification page on how to construct a proper VCF file.

Variant types

You can either submit Single Nucleotide Variants (SNVs) or small insertions/deletions (indels). Other types of variants (e.g. CNVs) can be present in your list, but they will not be included in the analysis.

Specifically for indels, you can submit your variants in either one of the two different ways that are shown for a particular variant example (the VCF file can contain more columns).


Tab-delimited list VCF file
Example with dashes 16 3254468 CTT - Heterozygous 16 3254468 . CTT - PASS GT 1/0
Example without dashes 16 3254467 CCTT C Heterozygous 16 3254467 . CCTT C PASS GT 1/0

Genome version

At the moment ORVAL accepts and annotates variants using the GRCh37/hg19 human genome assembly.

We do not make conversions of genomic coordinates from different genome versions. In case you need to convert your variants, you are encouraged to use tools like the UCSC, Ensembl and NCBI assembly converters.

Patient information

Except from the variant list, you should also provide (if available) the sex information of the patient, i.e. if the person is a male or a female.

ORVAL handles differently X-linked variants in males (hemizygous variants) compared to females, and therefore this information is important in order to provide better predictions.

patient information picture

Example input files

You can try ORVAL with the two example VCF files that are present in the VCF file section of the variant submission page. These files give you the opportunity to test ORVAL on a small or large number of variants and see what the webserver has to offer.

  • the Example_VCF_1 file contains 25 variants and its running time (with filtering) is 15 seconds.
  • the Example_VCF_2 file contains 1800 variants and its running time (with filtering) is 18 seconds.

Job submission

Every time you submit your data, you will first get directed to the Submitted ORVAL Job page where you can follow the status of your submission.

In this page you will also receive a Job Id, which you can use to re-access the results of that specific submission or report errors. That Job Id is also present in the Results site in the format: orval.ibsquare.be/results?id=YourJobID.

You can re-access your results by:

  • saving the URL of the Job or the Results page
  • typing on your browser https://orval.ibsquare.be/results?id= followed by the Job ID

Do you receive error or warning messages during your data submission? You can consult the Frequently Asked Questions (FAQ) section for suggestions on how to handle them.


Data filtering and annotation

Data filtering

In the Submit variants page, ORVAL offers a recommended variant and gene filtering procedure that will automatically run when you submit your data. This procedure is highly recommended, as it will limit the amount of variant combinations to be tested and will restrict the analysis to the most relevant variants.

There are two types of filtering offered by ORVAL: a variant filtering and a gene filtering procedure.

Variant filtering

The variant filtering procedure ensures that your analysis will contain variants in accordance with the variant types used to train the predictive methods (VarCoPP and Digenic Effect predictor) integrated in ORVAL: exonic and splicing variants of MAF lower or equal than 3% in protein-coding genes.

The three different filtering options are already pre-selected in the Variant Filtering panel of the submission page. You can unselect a filtering option, by clicking on its corresponding check-box.

variant filtering tab
Filtering options

ExAC MAF

Select the minimum threshold of ExAC MAF for the variants. A MAF of ≤ 0.03 was used to train VarCoPP and is the recommended threshold.

Remove Intergenic

Removes variants that are not inside the defined gene coordinates, based on the human assembly GRCh37/hg19.




Remove intronic and synonymous

Removes:

  • all intronic variants that have a distance bigger than 13 nucleotides from each exon edge, based on the exon coordinates of the canonical transcript of the gene.

  • all synonymous variants that have a distance bigger than 7 nucleotides from each exon edge, based on the exon coordinates of the canonical transcript of the gene.

Gene filtering

The gene filtering option restricts the analysis to a specified list of relevant genes that can be present in your data. This procedure is highly recommended in case your VCF contains the complete exome of an individual, as it can dramatically limit the amount of False Positives that can be obtained.

To run your analysis only with a subset of genes, you can simply upload a .txt file with the gene symbols you are interested to include, each gene being in a different line. After submission, ORVAL will use this list to filter the genes that will be used in the analysis.

gene filtering tab

Data annotation

After you submit your data, ORVAL:

  1. automatically annotates them with the biological information needed for the integrated predictive methods (VarCoPP and the Digenic Effect predictor)
  2. creates all possible variant combinations between any pair of genes present in your variant input and
  3. orders the variants and genes inside each combination.

Below, you can find some important parameters for each process.

Gene annotation

To map variants in genes, ORVAL uses at first the gene information that is present in the CADD annotation file for that variant and uses only the canonical transcripts of those genes, according to the Ensembl Grch37/hg19 genome version.

In cases where a variant can be mapped to multiple genes, ORVAL maps that variant to only one gene based on a set of priority rules that include (starting from higher to lower priority): valid gene IDs and canonical transcript, prioritisation of genes based on their biotype and the functional consequence of the variant, prioritisation of genes where the variant falls inside the gene and canonical transcript coordinates, presence of a CCDS, prioritisation of gene with the longest canonical transcript, etc.

ORVAL then annotates the genes with:

  • the required gene features for VarCoPP (detailed description of the features and their sources in the link)
  • the required gene features for the Digenic Effect predictor (detailed description of the features and their sources in the link)
  • the Gene Damage Index (GDI), a metric that shows the susceptibility of a gene to disease. Lower values of GDI indicate greater susceptibility of a gene to candidate disease-causing mutations.
  • the protein sequences from Uniprot using the Ensembl canonical identifiers, as these are needed to calculate some of the features of our predictive methods

Gene pair annotation

ORVAL annotates a gene pair with:

  • their Biological Distance, a metric of biological relatedness between any two genes, based on protein-protein interaction information, which is used as a feature for the VarCoPP .
  • involvement in the same pathway information from Reactome, which is used as a feature for the Digenic Effect predictor
  • protein-protein interaction (PPI) and cell co-localisation information from the comPPI database

ORVAL uses the Gene Damage Index (GDI) metric to order the appearance of genes inside each digenic variant combination, with gene A being always the gene with the lower GDI value, and thus more probable to be have a disturbed function due to the presence of a variant. You can find more details about how ORVAL creates digenic variant combinations and orders variants and genes in the Creating digenic variant combinations section of the Documentation page.

Variant annotation

ORVAL first maps a variant in a gene based on the Gene annotation process described above. It then annotates each variant with:

  • the required variant features for VarCoPP, the most important being the CADD score (detailed description of the features and their sources in the link)
  • the required variant features for the Digenic Effect predictor (detailed description of the features and their sources in the link)

When ORVAL creates digenic variant combinations, it uses the CADD score to order the appearance of variants that are present inside the same gene (i.e. in cases of heterozygous compound variants). You can find more details about how ORVAL creates digenic variant combinations and orders variants and genes in the Creating digenic variant combinations section of the Documentation page.

Variant exclusion

In some situations during the data annotation process ORVAL excludes variants from the analysis and you will not find them in the results:

  • Variant not exonic in canonical transcript
    We use only the canonical Ensembl transcript identifiers to annotate our variants. If you have selected to exclude intronic variants from your analysis, if the variant is not exonic in the canonical transcript of the gene, even if it may be exonic in an alternative transcript, it will be excluded.
  • Variant with invalid zygosity
    Variants with GT:0/0 or GT:0|0 in a VCF file are considered invalid and are excluded from the analysis.
  • Alternative variant
    In case multiple alternative variants are present in a row in a VCF file, we only take into account the first alternative variant. The rest of the variants are excluded from the analysis.
  • CADD score not available
    ORVAL annotates variants with a CADD score, which is a feature required for the pathogenicity predictions. As this feature is important for the predictions, if a CADD score is not available for a variant, that variant is excluded for the analysis, as a missing value may severely alter the results.
  • Variants only in one gene
    As ORVAL creates combinations between gene pairs, if your input data includes variants from one gene only, you will not get any results.
  • The variant is a CNV
    ORVAL analyses only SNVs and small insertions and deletions. Any other variant type in your data is automatically excluded from the analysis.

Creating digenic variant combinations

After annotation, VarCoPP creates all possible variant combinations between any gene pair present in your input, taking into consideration any filtering options you have included during your variant submission.

You can find below a list of details and constraints that take place during this procedure.









Number of variants per combination

ORVAL creates for any gene pair variant combinations that can be:

  • bi-allelic (i.e. one mutated allele at each gene)
    e.g.: one heterozygous variant per gene

  • tri-allelic (i.e. three mutated alleles in total)
    e.g.: an homozygous variant at gene A and an heterozygous variant in gene B

  • tetra-allelic (i.e. four mutated alleles in total)
    e.g.: one homozygous variant per gene

In the tri-allelic and tetra-allelic cases, a digenic combination can also include heterozygous compound variants (i.e. two different mutated alleles in the same gene), along with the presence of variant(s) in another gene.


Order of genes

For each digenic variant combination, gene A is always the gene with the lowest Gene Damage Index (GDI) (see also the Gene Annotation section) and, thus, the one with a higher probability to be associated with a disease.

Order of variant alleles inside the gene

In case of two different mutated alleles in the same gene (heterozygous compound cases), the variant allele 1 is always the variant allele with the highest CADD score.

A graphical representation of a digenic combination

digenic combination example

The predictive methods of ORVAL

VarCoPP: the variant combination pathogenicity predictor

VarCoPP stands for Variant Combination Pathogenicity Predictor. It is a machine-learning method that predicts the pathogenicity of any bi-locus variant combination (i.e. a combination of two to four variant alleles between two genes).

The method has been published in the PNAS journal: https://doi.org/10.1073/pnas.1815601116. See also the Cite us section in the About page, for a list of all relevant citations.

Based on VarCoPP, a bi-locus variant combination can either be candidate disease-causing or neutral.

Structure of VarCoPP


ALGORITHM

VarCoPP is an ensemble predictor that consists of 500 individual predictors, and more specifically, 500 classification Random Forest (RF) algorithms.




TRAINING DATA

Each predictor of VarCoPP has been trained on the pathogenic variant combinations present in the Digenic Diseases Database (DIDA) against a different subset, each time, of variant data derived from control individuals of the 1000 Genomes Project (1KGP).

The variant types that were used for training were the same for both DIDA and 1KGP: exonic and splicing variants of up to 3% MAF, while all genes were protein coding genes.




RESULT CALCULATION

When a bi-locus variant combination is tested with VarCoPP, each individual RF provides a probability on that combination to be candidate disease-causing. If the probability is above 0.532, then the RF predicts that this combination is candidate disease-causing. The final prediction is based on a majority vote: if 50% or more of the RFs agree that a bi-locus combination is candidate disease-causing, then the final prediction is that it belongs to the candidate disease-causing class.

Therefore, in general, a bi-locus combination is predicted as candidate disease-causing if ≥50% of the predictors agree that it is candidate disease-causing and the median probability for this prediction among all predictors will be, consequently, ≥0.532.

A graphical representation of the structure of VarCoPP
summary of varcopp structure

Prediction features

VarCoPP uses different variant, genes and gene pairs biological features to make the predictions.

Feature Feature abbreviation Gene / Variant allele
CADD raw score
PMID: 24487276
CADD1
CADD2
CADD3
CADD4
Gene A / Variant allele 1
Gene A / Variant allele 2
Gene B / Variant allele 1
Gene B / Variant allele 2
Amino acid hydrophobicity difference
PMID: 8836100
Hydr1 Gene A / Variant allele 1
Amino acid flexibility difference Flex1 Gene A / Variant allele 1
Gene haploinsufficiency probability
PMID: 20976243
HI_A
HI_B
Gene A
Gene B
Gene recessiveness probability
PMID: 22344438
RecA
RecB
Gene A
Gene B
Biological distance
PMID: 24694260
Biol_Dist Gene pair AB

Evaluation scores

For each bi-locus combination VarCoPP provides two prediction scores, based on the way it makes the predictions. These scores are also used to rank the bi-locus combinations in the output files.



Support score (SS)

The Support score (SS) of a bi-locus combination indicates the percentage of RFs that agree that the combination is candidate disease-causing. It can therefore take values between 0 (no RF predicted that the combination is pathogenic) to 100 (all RFs predicted that the combination is pathogenic).

For candidate disease-causing combinations, SS is always equal or larger than 50.0.


Classification score (CS)

The classification score (CS) of a bi-locus variant combination is defined as the median probability of that combination being disease-causing among all RFs. It can take values between 0 and 1.

For candidate disease-causing combinations, CS is always larger than 0.532.

In general, the higher these scores are, the more confident VarCoPP is for the disease-causing class. These scores can be used for a prioritisation of candidate disease-causing variant combinations, you can further consult our tutorial.

95% and 99% confidence zones

With VarCoPP we have defined 95%- and 99% confidence zones, delimited by minimal Classification (CS) and Support scores (SS), which provide a probability of whether a particular combination predicted as candidate disease-causing, is actually a True Positive (TP) result. This indication can be useful for further evaluation and filtering of the predictions.

These confidence zones were created by testing neutral bi-locus combinations from the 1000 Genomes Project and obtaining the minimal CS and SS scores that gave 5% and 1% False Positives respectively. If a combination falls into either one of the two zones, a coloured indication will appear in the summary results.


95%-confidence zone

Requires CS≥0.64 and SS≥83.2. If a digenic combination falls inside this zone, it has 95% probability of being a TP result.


99%-confidence zone

Requires CS≥0.83 and SS=99.8. If a digenic combination falls inside this zone, it has 99% probability of being a TP result.

The Digenic Effect Predictor


The Digenic Effect predictor is a machine-learning method that predicts the type, or else the digenic effect of a pathogenic digenic variant combination. This information could be useful in case there is no pedigree information or parent genotypes available, as it could give a predictive indication of the effect of a predicted as pathogenic variant combination. As this is a machine-learning approach, again, a manual investigation by the user can confirm or reject the assigned digenic effect class.

The Digenic Effect predictor has been published in the Artificial Intelligence in Medicine journal: https://doi.org/10.1016/j.artmed.2019.06.006 and the Nucleic Acids Research journal: https://doi.org/10.1093/nar/gkx557. See also the Cite us section in the About page, for a list of all relevant citations.

The Digenic Effect predictor can distinguish between three classes of pathogenic variant combinations:

True Digenic

Variants at both genes are needed to show the disease phenotype.

Monogenic + Modifier

The variant at the first gene acts as the major monogenic variant that can trigger disease symptoms, while the second variant acts as a modifier of symptoms severity or age of onset.

Dual Molecular Diagnosis

Conjunction of variants that trigger two independent monogenic disorders that occur simultaneously within a single patient.

navigation bar of Results page

The three types of digenic effects.
Combination a, a True Digenic combination, where the simultaneous presence of a pathogenic allele in each gene is necessary for the individual to express the disease. phenotype.
Combination b, a Monogenic plus Modifier combination, where a variant on the major gene induces a disease phenotype, while a mutation in the modifier gene modifies it, either by rendering it more severe or producing an early onset.
Combination c, a Dual Molecular Diagnosis combination, where both loci are responsible for either distinct or overlapping phenotypes for two different diseases.

The structure of the Digenic Effect predictor

ALGORITHM

The Digenic Effect predictor is a classification Random Forest (RF) algorithm.



TRAINING DATA

The Digenic Effect predictor was trained on 240 pathogenic variant combinations.

More specifically, it has been trained on 90 True Digenic and 75 Monogenic+Modifier variant combinations present in the Digenic Diseases Database (DIDA) and 75 Dual Molecular Diagnosis combinations derived from the work of Posey et al.

The variant types were single nucleotide variations and small insertions/deletions.


RESULT CALCULATION

The Digenic Effect predictor provides probabilities (from 0 to 1) for all three digenic effect classes for a variant combination.

The final digenic effect class is the class with the highest probability among the three.

Prediction features

The Digenic Effect predictor uses different variant, genes and gene pairs biological features to make the predictions.

Feature Feature abbreviation Gene / Variant allele
CADD raw score
PMID: 24487276
CADD1
CADD2
CADD3
CADD4
GeneA / Variant allele 1
Gene A / Variant allele 2
Gene B / Variant allele 1
Gene B / Variant allele 2
Gene recessiveness probability
PMID: 22344438
RecA
RecB
Gene A
Gene B
Essential in mouse
PMID: 23675308
EssA
EssB
Gene A
Gene B
Same pathway
SOURCE: Reactome
Pathway Gene pair AB

Navigation of the ORVAL results

After the analysis is finished, you will be directed to the Results page, where you will be able to explore the oligogenic network that is created using the VarCoPP predictions, the ranking of your gene pairs, based on their content of predicted candidate disease-causing combinations and the detailed digenic pathogenicity probabilities and scores information of your input.

You can access each section by clicking on the corresponding tab at the navigation bar at the top of the page.

navigation bar of Results page

Oligogenic exploration

This section provides the space for the exploration of potentially oligogenic signatures. The information is guided by the predictions of VarCoPP, which predicts the pathogenicity of variant combinations between gene pairs.

The oligogenic information is mainly shown in the form of a gene network, whose nodes represent genes and whose edges connect two genes only if there exists at least one variant combination between them that has been predicted as candidate disease-causing with VarCoPP. The users can explore and filter the network, as well as investigate the protein-protein interactions and the involved pathways of the genes that belong in the same module.

Oligogenic combination network

The first panel of the main Results page, contains the predicted candidate disease-causing oligogenic combination network.

Network description

Like every network, the oligogenic combination network contains nodes and edges.

example network

Node

Each node represents a gene present in your data.



Edge

Connects two genes only if there exists at least one candidate disease-causing variant combination predicted by VarCoPP between them.

The colour of the edge represents the highest pathogenicity score for that pair, and more specifically, the highest Classification Score (CS) computed for a variant combination of that pair (see the VarCoPP scores section a detailed explanation of the score).
This score is represented in a colour range from yellow (low pathogenicity score) to dark red (high pathogenicity score), representing the CS values from 0.532 to 1.0, respectively.

You can:

  • Move a node
    You can select and move a node to arrange it in the network.
  • Click on a node
    By clicking on a node, this node appears with a purple border and a module panel appears automatically on the right of the panel with more information about the module the gene belongs to. At the top of that module panel, the Click here to further explore this gene module link directs you to the specialised page for your selected oligogenic module.
  • Download the network
    By clicking on the download button, which is present at the bottom right of the panel, you can download the network in its current state, including the filtering options you have selected, in the Graph ML format. This file format can be imported in various graphical tools, e.g. with yED or in network analysis tools, such as Cytoscape and and Gephi.

Gene selection

gene selection table

The Gene selection table on the left of the panel contains all genes present in the oligogenic network. The gene table changes automatically according to the filters you select either on the table itself or on the Filtering section below.

At the beginning, all genes are automatically selected and shown in the network.

You can:

  • remove a gene from the network by clicking on its corresponding check box and unselecting it.

  • order the appearance of genes based on their centrality in the network, and this centrality can be based either on the:
    • degree of the node: the number of edges connected with that node
    • closeness of the node: the sum of the length of the shortest paths between the node and all other nodes in the graph, i.e. how close the node is with the other nodes of the network

  • click on a gene to show the module panel with more information about the gene module it belongs to.

  • search the table based on a gene name.

  • download the table in its current state with the button.

Network filtering

gene selection table

The network filtering option allows you to remove edges from the network by adjusting the thresholds of two metrics:

  • the pathogenicity score: the threshold for the highest pathogenicity score for a combination of a gene pair, which is based on the Classification Score provided by VarCoPP (see the VarCoPP scores section for more details)

  • the centrality: the centrality threshold of a gene in the network

Oligogenic gene module

In this section you can further explore and filter the genes of your selected gene module, with the oligogenic gene module network on the right and the module gene selection table on the left of the panel.

The oligogenic gene module network description

This is the selected gene sub-network shown in the exact same way that is present in your main oligogenic network. The nodes and edges of the network represent, again, the genes and the highest Classification Scores of the gene pairs, respectively (see the Oligogenic Network section for a description).

example of a selected gene module from the oligogenic network

You can:

  • Move a node
    You can select and move a node to arrange it in the network.
  • Download the module
    By clicking on the download button, which is present at the bottom right of the panel, you can download the gene module in its current state, in the GraphML format. This file format can be imported in various graphical tools, e.g. with yED or in network analysis tools, such as Cytoscape and and Gephi.

Module gene selection

module gene selection table

The Module gene selection table on the left of the page contains all the genes present in your selected oligogenic module.

You can:

  • search a gene on the table based on its name or external ID (e.g. Ensembl or Uniprot ID).

  • order the appearance of the genes based on their Gene Damage Index (GDI), with genes having a lower GDI being more probable to carry pathogenic mutations.

  • click on a gene name to be directed to its corresponding HGNC page.

  • click on an external ID to be directed to its corresponding source page.

  • download the table in its current state with the button.

Protein-protein interaction information

In this section you can explore any existing direct and indirect protein-protein interactions (PPIs) present in your selected module and get information about the position of the proteins in the cell.

All required information is extracted from the comPPI database.

Protein-protein interaction network

On the left panel of this section you can see a protein-protein interaction network that contains nodes and edges.

Example of a PPI network





Node

Each node represents a protein.

There are two types of nodes in this network:

  • Purple nodes: the proteins of your selected module
  • Grey nodes: external proteins that are present in the network only if they directly interact with two proteins of the selected module. These proteins are useful to show indirect physical interactions of your selected proteins.



Edge

Connects two nodes (proteins) if they directly physically interact.

There are two types of edges in this network:

  • Purple edges: direct interactions between the proteins of your selected module.
  • Grey nodes: direct interactions between a protein of your selected module and an external protein.

You can:

  • Click and move a node
    You can select and move a node to arrange it in the network.
  • Hover on a node
    By hovering upon a node, a box appears with further information about the corresponding gene name, the Uniprot Accession ID and the cellular location of the protein.

    Example of hovering on a PPI network noden
  • Download the PPI network
    By clicking on the download button, which is present just above the network module, you can download it in its current state, in the Graph ML format. This file format can be imported in various graphical tools, e.g. with yED or in network analysis tools, such as Cytoscape and and Gephi.

Cellular information

At the right panel of this section you can explore the cellular location of all proteins present in the PPI network, with the interactive cellular location pie chart. Each part of the chart corresponds to a different cellular location.

Example of the cellular pie chart

You can:

  • Hover over a cellular location
    By hovering on a particular cellular location you can get further statistics inside the plot for the:

    • Ratio of the location: number of protein-cellular location links among all protein-cellular location links
    • Overlap ratio of the location: number of proteins present in the cellular location among all proteins of the network

    All proteins of the PPI network that belong to this location will be automatically coloured as well.

Pathway information

In this section you can explore the cellular pathways where the genes in your selected module are involved in with the summary pathway treemap on the left panel and the detailed pathway table on the right panel.

All required information is extracted from Reactome.

Pathway treemap

pathway treemap example

The treemap on the left panel of this section shows the summary of the different pathway categories of the genes present in your selected gene module.

Each main pathway category is enclosed in a box surrounded by a black stroke and contains nested pathway subcategories, based on the information from Reactome, descending from the more general to the more specific ones. The last sub-category is the most detailed pathway mapping of the gene.

The ordering from the more general to the more specific pathway categories is shown with a transition from:

  • bigger to smaller text font
  • lighter to darker colour gradient

The size of each main pathway category is determined by the number of genes of the selected module that it contains.

Pathway table

pathway table example

The pathway table shows more details about all pathway categories (general and specific) of your gene module.

You can:

  • order the appearance of the pathways based on the number of your module genes they contain.

  • click on each pathway to get further information from its corresponding page in Reactome.

  • search/filter the table based on a pathway or gene name(s). You can provide multiple gene names, separated with space.

  • download the table with the button.

Gene pair ranking exploration

With this section you can explore the gene pairs that are present in your data and rank them based on the content of candidate disease-causing variant combinations that have been predicted with VarCoPP.

S-plot example

This information is shown in a gene pair table that provides statistics on all gene pairs present in your data. The table is divided into the statistics on the percentage and number of pathogenic variant combinations for each pair, and the median pathogenic scores provided by VarCoPP (i.e. the Support Score and the Classification score) among combinations of that pair, to get an idea of their severity.
For further explanations on how these pathogenicity scores are calculated, you can consult the VarCoPP Prediction Scores section on this Documentation page.

The table is initially ranked based on the following columns in descending order of importance:

  1. percentage of pathogenic combinations
  2. median VarCoPP Classification Score
  3. median VarCoPP Support Score

You can:

  • Rank the table based on a column:
    You can rank your table based on a column by clicking on the arrows on the column name.
  • Search/filter the table based on gene(s):
    You can search for a gene by typing the gene name in the search area.
    You can also search for a gene pair by typing the two genes you are searching, separated with a space.
  • Download the table:
    You can download the current table by clicking on the button. If you have filtered first your table based on a gene, the downloaded table will only contain that selection.

Digenic combinations exploration

With this section you can explore the results of the digenic pathogenicity predictions of VarCoPP for all digenic variant combinations of your data.

You can get a visual overview of the results with the interactive S-plot and inspect and download all results with the Summary table. By clicking on each digenic combination in the table you can get more details about its pathogenicity prediction, its pathogenic digenic effect and get access to useful variant, gene and gene-pair annotations.

Digenic results overview: S-plot

The S-plot gives an interactive visual overview of the VarCoPP predictions for all digenic variant combinations present in your data.

S-plot example

All combinations are plotted based on the two prediction scores provided by VarCoPP:

y-axis

Support Score: the percentage of individual VarCoPP predictors agreeing that the digenic combination is candidate disease-causing

x-axis

Classification Score: the median probability among all individual predictors of VarCoPP that the combination is candidate disease-causing

The colour of each digenic combination represents the prediction and the pathogenicity confidence that is provided with VarCoPP for that combination (for details on how this confidence is calculated, you can consult the VarCoPP confidence zones section in the Documentation).

dark red

the variant combination is predicted as candidate disease-causing with 99% confidence

red

the variant combination is predicted as candidate disease-causing with 95% confidence

orange

the variant combination is predicted as candidate disease-causing without falling into one of the two confidence zones

blue

the variant combination is predicted as neutral

grey

a previously tested neutral combination to serve as validation background

You can interact with the plot in several ways:

  • Hover on a combination:
    By hovering on a combination in the plot, a box appears with information about the gene pair, the VarCoPP pathogenicity scores and further links.
  • Zoom-in:
    You can zoom-in on the plot by selecting a rectangular area with your mouse. The Summary Table on the right panel is updated automatically to include only the variant combinations present in the plot at that specific time.
  • Re-initialize the plot:
    You can re-initialize the S-plot after zooming in by clicking on the button.
  • Remove the 10K neutral background combinations:
    You can remove the tested background neutral variant combinations from the plot by clicking on the button.
  • Download the plot:
    You can download the plot by clicking on the button. Note that if you have zoomed-in on the plot, the plot will be downloaded in the zoom-in mode.

Digenic results overview: Summary table

The summary table on the right panel shows further details for each digenic combination. That table is automatically updated based on the filters you choose on the table itself or on the S-plot on the left.

The combinations are ranked based on their Support Score, with those having the highest score being first.

You can:

  • change the ranking by clicking on either the Support Score or Classification Score columns.

  • click on each digenic combination to get more details about its pathogenicity prediction, its pathogenic digenic effect and get access to useful variant, gene and gene-pair annotations.

  • search/filter the table based on a variant or gene name(s). You can use multiple variants or genes by separating them with a space.

  • download the table in its current state by clicking on the button.
example of the VarCoPP summary table

The colour of each digenic combination in the table represents the pathogenicity confidence of the combination (for details on how this confidence is calculated, you can consult the VarCoPP confidence zones section in the Documentation).

dark red

the variant combination is predicted as candidate disease-causing with 99% confidence

red

the variant combination is predicted as candidate disease-causing with 95% confidence

orange

the variant combination is predicted as candidate disease-causing without falling into one of the two confidence zones

blue

the variant combination is predicted as neutral

Pathogenicity prediction information

In this section you can explore the results of VarCoPP, which predicts the pathogenicity of a digenic variant combination as either candidate disease-causing or neutral. Furthermore, you can see further explanations on how each biological feature decides for either the disease-causing or the neutral class. This is an important step that can aid in understanding and evaluating the results obtained by our predictive methods.

In ORVAL we are using the tree-interpreter python module, a method that allows us to see, for every variant combination, the preference each feature shows for either the neutral or the disease-causing class inside each individual predictor of VarCoPP. Based on this method, we get specific preference values for each feature that range from negative to positive.

We visualise these class preference values per feature by using box plots that reveal both the median and variance of class preferences among the individual predictors in VarCoPP.

Feature in red color

The feature has a positive median preference value among all predictors of VarCoPP and votes in favor of the disease-causing class. The higher the value, the stronger the vote for the disease-causing class is.

Feature in blue color

The feature has a negative median preference value among all predictors of VarCoPP and votes in favor of the neutral class. The lower the value, the stronger the vote for the neutral class is.

For a detailed description of the biological features used for predictions, you can consult the VarCoPP features section.

An example of feature interpretation for a prediction

The following box-plot corresponds to a bi-locus combination that was predicted as candidate disease-causing with a Support Score of 89.4. This can already tell us that probably some of the features were conflicting among the predictors, as we do not have a clear consensus.

example of tree interpreter plot

In the boxplot, we see the preference of each feature for either the disease-causing or the neutral class among all individual predictors of VarCoPP, for that particular variant combination.

In this case, we can see that CADD1 and CADD2 (the CADD scores of the 1st and 2nd variant alleles of gene A, see also the Feature Description section), contribute a lot to the disease-causing class vote, as they have the highest positive contribution median value among the rest of the features. This probably means that the CADD scores of those variant alleles are quite high (here, we most probably deal with an homozygous or heterozygous compound variant in gene A, where the 2nd variant allele is not wild-type), something that we can verify by looking at the annotation of the digenic combination in the Digenic Results page.
On the other hand, CADD3 (the 1st variant allele of gene B) drives the prediction towards the neutral class.

We can also see that although RecA (the recessiveness probability of gene A) also has a positive median preference value, as it is coloured in red, its preference values in some of the predictors spread below zero, meaning that this feature was conflicting between the disease-causing and the neutral class.

Digenic Effect prediction

In this section you can explore the results of the Digenic Effect (DE) predictor that predicts the digenic effect of a pathogenic variant combination (i.e. whether it is True Digenic, Monogenic + Modifier or a Dual Molecular Diagnosis case). If a variant combination has been predicted as candidate disease-causing, you will find further information of its digenic effect in this section.

This information is presented on the left with a table that provides the probabilities for each digenic effect class and the predicted Digenic Effect (i.e. the class with the highest probability), while the radar plot on the right panel provides a visual representation of the prediction results.

An example of a Digenic Effect prediction

The following image corresponds to a pathogenic variant combination whose digenic effect is predicted to be True Digenic.

example of a digenic effect prediction for a combination

The table on the left provides the probabilities for all three possible digenic effect classes. We can see that the True Digenic class has the highest probability (0.756) compared to the rest and, therefore, this is the final Digenic Effect class that is predicted for that combination.

The radar plot on the right simply shows a summary visualisation of the results shown in the table. Each line represents a Digenic Effect class and can take probability values from 0 (center of the plot) to 1 (edge of the plot). Three dots fall to the corresponding probability value of each class respectively, forming a triangular shape. With this shape we can get a quick visual idea of which class is prefered and whether this preference is strong or not (depending on the skewness of the triangular shape). In this case, we can clearly see that the prediction falls to the True Digenic class based on the skewness of the triangular shape towards this class.

Exception messages

In some cases you will not be able to get information about the Digenic Effect of a variant combination and you will see some exceptional messages instead:

  • The variant combination is not pathogenic
    As the Digenic Effect predictor can only work on pathogenic variant combinations, if the combination you are exploring is predicted as neutral by VarCoPP, you will see the following message:

    example of a missing digenic effect prediction because of a neutral combination
  • There is some missing annotation for a variant combination
    The Digenic Effect predictor cannot make a prediction for a variant combination when some annotations that are used for the prediction are missing for that combination. In this case, you will see the following message:

    example of missing annotation for a variant combination

Tutorials

In case you would like to see a specific tutorial in ORVAL, you can contact us.

Prioritisation of digenic variant combinations

Depending on the size of the data you are analysing you may end up with many digenic variant combinations predicted as candidate disease-causing. ORVAL is not a prioritisation tool per se, however, it is possible to limit your analysis to those combinations that could potentially be more interesting for your research. For this, you can follow the next steps:

  • 1. Overview and filtering of the digenic variant combinations
example of a digenic combinations table

The combinations in the Summary Table are ranked based on their Classification Score (CS) and Support Score (SS), with those at the top having the highest scores. The higher the CS and SS assigned to a digenic combination, the more confident VarCoPP is for the disease-causing class (consult the VarCoPP evaluation scores section for a detailed explanation).

  • The strictest way to filter your combinations is by focusing on those falling in the 99%-confidence zone, in dark red colour (the first two combinations in this example). These have 1% probability of being False Positives. Keep in mind that this selection can be quite strict and can lead to False Negatives.
  • To lessen your criteria, you can also include the combinations falling in the 95%-confidence zone in red colour (the next five combinations in this example), which have 5% probability of being FPs.
  • If the steps described below do not yield convincing results, you can also try to include the combinations that are predicted as disease-causing but do not fall in any of the two confidence zones, and these are depicted in orange.

For more information, consult the VarCoPP Confidence Zones section of our Documentation page.

  • 2. Find relevant gene modules in the predicted pathogenic gene network
example of a gene network

Are the genes relevant? You can explore the predicted pathogenic gene network in ORVAL, which appears first in the results page. In this network, two genes are connected with an edge only if they contain at least one predicted pathogenic digenic combination.

You can first filter the network to keep the most relevant gene pairs, based on the pathogenicity cutoff that you choose. You can use the Filtering panel on the left, to keep only the gene pairs containing combinations falling in the 95%- and 99%-confidence zones, by selecting 0.64 as the minimum Gene Pair Pathogenicity Score threshold (i.e., this is the minimum Classification Score for the 95%-confidence zone, see the VarCoPP Confidence Zones section of our Documentation page).
For a stricter analysis, you can use the 0.83 value as a cutoff to include only gene pairs with combinations falling in the 99%-confidence zone.

Once you are satisfied with your network, click on a gene in the network to make the gene module panel appear on the right and click on the link that appears to get directed to another page the shows PPI and pathway information for those genes.

NOTE: if a gene is shown as a hub in the gene network, meaning that it is highly connected with other genes, it may mean that it contains a variant there that drives the predictions higher. This may indicate that a Monogenic plus Modifier concept may be present in your data. However, if this gene does not seem relevant for the phenotype or seems unrelated to the rest of the genes, you could try to remove it from the network and consider the rest of the genes. You can use a general Centrality threshold of your choice or unselect genes manually in the gene table on the left.

Detailed information on ways to filter your network can be found in the Oligogenic network section of our Documentation.

  • 3. Explore the relationship between the genes in the gene module
example of a PPI network example of a PPI network

Once you click on the gene module link, ORVAL offers PPI, cellular location and molecular pathway information as a starting (and definitely not exhaustive) point to understand the relevance and interactions of genes in-silico. For an explanation of these graphs, consult the Oligogenic gene module, PPI network and Pathway information sections of our Documentation.

Are the genes in your module related in terms of their PPIs? In the PPI network (the first picture in this example) you can see whether the proteins of this module (purple nodes) directly interact (purple edges) or indirectly interact with one external protein in between, which are depicted as grey nodes and are linked with grey edges. In this example, the proteins of your module TRIM54 and TRIM63 directly interact, but also indirectly interact with several other external proteins found in the comPPI database.

Are the genes in your module involved in similar molecular pathways? You can consult the Reactome pathway treemap and the pathway table to find genes involved in similar pathways (the second picture in this example).

This is a starting point for you to explore the relationships between the genes present in the selected module. Based on this information, you can start limiting the results to the gene pairs that seem to be most relevant.

  • 4. Explore the variant combinations of the selected gene pairs
example of a digenic combinations table example of a boxplot explaing the voting preference of VarCoPP features example of the digenic effect prediction

You can now go back to the digenic variant combinations Summary Table (first picture) and the S-plot, and explore the variant combinations that are linked to the gene pairs that you have selected based on the previous steps. Especially if multiple variant combinations are linked with the same gene pair, it will be interesting to further explore their predictions to see which are relevant. A summary statistics for each gene pair is also available at the Gene pair ranking table of the Main Results page.

Once you find a combination of interest, you can click on it in the Summary Table (second column) or in the S-plot, and you will be directed to a page with information specific for that combination. There, you can first see a summary of the variants and the predictions of both VarCoPP and the Digenic Effect predictor.

You can explore the Feature preference boxplot of VarCoPP (second picture). How do the features vote for either the disease-causing (features in red colour), or neutral (features in blue colour) class? The more these feature preferences deviate from zero, the stronger their vote for a particular class is. In this example, we see that the CADD score of the variant alleles of gene A(CADD1 and CADD2) vote the strongest for the the disease-causing , whereas the CADD score of the first variant allele of gene B (CADD3), as well as its recessiveness probability (RecB) vote the strongest for the neutral class. You can further evaluate the values of these features and why they tend to vote for either class in the Annotations section of that page.

You can further get an indication of the Digenic Effect of that particular combination, i.e. whether it has a True Digenic , Monogenic gene plus Modifier or Dual Molecular Diagnosis effect (third picture). These predictions are indicative and further inspection or confirmation is required. In this example, the variant combination has a very strong prediction for the True Digenic class.

The information in this page can help you evaluate whether the specific variant combination seems relevant for your analysis.

  • 5. Examine the relevance of selected gene pairs and combinations for the phenotype

Do the selected gene pairs and variant combinations make sense to you as a clinical researcher and are they in accordance or could they explain the patient's phenotype? At this step, you have a filtered set of gene pairs and variant combinations that could be potentially relevant, based on in silico information.

  • 6. Optional: Repeat steps 1-5 with less strict criteria

If at this point the information seems incomplete or you still do not obtain promising results, you could lessen the strictness of your criteria and repeat the steps 1-5. For example, if you have selected only gene pairs with combinations falling into the 99%-confidence zone, you could now also allow gene pairs with combinations falling also into the 95%-confidence zone. Furthermore, if combinations falling into these confidence zones are not available or they do not seem convincing, you can try to include all gene pairs and variant combinations predicted as disease-causing.

  • 7. Explore familial and functional evidence

The previous steps provide a way to limit your analysis to those variant combinations that seem to be more relevant or more promising for further research and experiments. ORVAL cannot, in any way, provide a definite way for diagnosis or medical advice. Further evidence is needed to show whether they can indeed be True Positives based on segregation analyses and functional experiments.


Browser compatibility

You can find below which browsers are suitable for ORVAL based on your operational system:

OS Version Chrome Firefox Microsoft Edge Safari
Linux Ubuntu 20 87.0.42 83.0 N/A N/A
MacOS Catalina 87.0.42 83.0 N/A 14.0
Windows 10 87.0.42 83.0 44.17 N/A

Frequently Asked Questions

If answers to your questions are not provided in this section and no information about your question is mentioned in the Documentation page, you can contact us.

Is there a limit on the number of variants I can uploaded?
In general, we highly recommend the use of variants from up to 300 genes, as well as the application of the variant filtering procedure that is provided with ORVAL, in order to limit the amount non-relevant combinations that will be tested.

Based on our server testings, you can either copy-paste a variant list of up to 80000 variants or upload a VCF file of size up to 50 MB. You can consult the Input Data section of our Documentation page for more details.

Can I include variants from multiple patients in my input?
No, the analysis should be restricted to a single individual only, as ORVAL creates all possible variant combinations from your variant list assuming that they belong to the same person.
If you want to analyse multiple patients, you should separate their variants in different files and explore them individually with ORVAL.

Is there a specific input format for the insertions and deletions?
ORVAL accepts different types of variant format for the insertions and deletions, involving dashes or not.
You can consult the Variant Types section in the Documentation page for a detailed explanation.

I see that the Job status of my variant submission is FAILURE. What should I do?

If there is no specific error message that explains the problem, you can follow these steps:

  • Check if your internet connection is running smoothly.
  • Re-try your data submission.
  • Check if your input data is correctly formatted. You can consult the Input Data section in our documentation page for a detailed explanation on the correct format for your variant submission.

If you have checked the previous steps and you still experience issues, you can send us an email, providing also the Job Id you have obtained during the submission.

During my data submission I see a message that I have exceeded the 5 Job submissions. What should I do?

For server monitoring purposes we allow every user (based on their IP address) to run maximum 5 different data analyses at the same time. In case you exceed this number, you have to wait until at least one of the running jobs is finished to launch a new one. You can consult the Job Submission section of the Documentation page for more details.

During my data submission I see a message that my uploaded file is using an unsupported format. What should I do?

The VCF file you have uploaded is not correctly formatted and ORVAL cannot parse it. Make sure that your file contains the header line (#CHROM POS REF ALT etc...) and tab-delimited columns with the CHROM, POS, ID, REF, ALT, FORMAT, SAMPLE_NAME columns.
You can consult the VCF file specification section of the Documentation page for a detailed description on how to properly format your VCF file.

What types of digenic combinations does ORVAL create?
ORVAL creates all possible di-allelic, tri-allelic and tetra-llelic variant combinations between any gene pair present in your data, including heterozygous compound variants in one of the two genes. Tetra-llelic combinations with four heterozygous variants (two in each gene) are not created, as these combinations were absent from our training set.
For a detailed explanation, you can consult the Creating digenic combinations section of our Documentation page.

I receive an error message that there are no variant combinations left to do the analysis. What should I do?

You can check the following steps to explore possible solutions:

  • Check whether the genome version you are using for the variants is correct. ORVAL annotates variants using the GRCh37/hg19 genome version, see the Genome version section of the Documentation page for more details.
  • Try to relax your variant or gene filtering options, especially the option for removing intronic and synonymous variants.
  • Ensure that you have more than one genes present in your data. ORVAL makes variant combinations between gene pairs, so it requires the presence of at least two different genes in your data.
  • If you have submitted a small number of variants, these variants may have been excluded from the analysis during the annotation process. For example, the CADD score for those variants may be missing, and as this score is important for the predictions, variants without an annotated CADD score are excluded from the analysis. A detailed list of all possible cases where your variants may be excluded is presented in the Variant Exclusion section of the Documentation page

If you have checked the previous steps and you still experience issues, you can send us an email, providing also the Job Id you have obtained during the submission.

SOME variants of my initial submission are missing from the results

  • Please check the variant and/or gene filtering options you have selected during your submission, as these play an important role on the absence of some of your variants in the analysis. The variant filtering options offered by ORVAL are automatically pre-selected during the submission, unless you unselect them.
  • Make sure that the variant information and format you provided is correct for all variants. In case you copy-pasted a variant list using the box panel, make sure that the zygosity values are not misspelled (Heterozygous or Homozygous zygosity values are accepted), see also the tab-delimited variant list section in the Documentation page.
  • Otherwise, some variants may have been excluded from your analysis during the data annotation process. You can consult the Variant Exclusion section of our Documentation page for a detailed description of such cases.


For more information regarding our filtering options and the data annotation process, you can consult the Data Filtering and Annotation section of our Documentation page.

I am pretty sure that a variant is supposed to be mapped to a specific gene (according to my knowledge or the gene panel I provided), but I see another gene name instead in the results.

This can happen for the following reasons:

  • The gene symbol present at the gene panel file is not the official HGNC gene symbol. During annotation we only use the official HGNC gene symbols, and thus, we will display this symbol instead of the one that you provided.
  • The position of the variant for that gene overlaps with other genes. In this tricky situation, we have to choose one gene to continue with predictions. We have developed a set of priority rules for that (see the Gene annotation section). Therefore, there is a chance that we have assigned this variant to another gene instead. Unfortunately, it is not possible to re-assign it to your favourite gene manually and we can potentially do changes in our database for that only during major ORVAL updates. Please note that you shouldn't change that gene name for the variant in the results, as some prediction features are gene-specific.
    If you think that this mapping is definitely wrong, you can send us an email.

I cannot see a network in the Results page
You can see a network in the Results page only if there is at least one candidate disease-causing variant combination predicted with VarCoPP, see for more details the Network navigation section in the Documentation page.
In any case, you can still explore all variant combinations in the Digenic predictions of the Results page.

Why do we see only SOME external proteins in the PPI network?
You see an external protein in the PPI network only if it links two proteins of your selected module with a direct protein-protein interaction. External proteins that are connected with your module proteins with higher degrees of interactions are not shown. You can consult the PPI network exploration section in the Documentation page.

Where can I find the Digenic Effect prediction for a variant combination?
You can see the Digenic Effect prediction of a variant combination on their corresponding Digenic Combination page. You can get directed to this page by clicking on a variant combination either in the Digenic Combinations Overview table or on the S-plot.
Please note that you will only see a Digenic Effect prediction if the variant combination is predicted as pathogenic with VarCoPP.

Do you store any of my data?
We store the results of your data submission for 7 days, so that you can re-access the corresponding Result pages. After this period, all data is deleted.
We do not store email addresses that you may have provided to us during the data submission. However, we track general user traffic information (e.g. IP addresses) for job monitoring purposes (e.g. restricting the number of parallel submissions from the same ID address).
For a detailed explanation of our Data Privacy procedures, you can consult the Data privacy section of the About page.