Skip to contents

Resources - overview

Data harvested and integrated from the following resources form the backbone of oncoEnrichR:

Resources - annotation types

This section describes the underlying annotation types of oncoEnrichR in detail, and to what extent they are subject to quality control/filtering for use within the package.

Target-disease associations

  • Associations between genes and diseases/tumor types are harvested from the Open Targets Platform. To increase the confidence of target-disease associations shown, we include only associations with support from at least two data types. Very weak associations, i.e. those with an overall score less than 0.05, have been excluded (complete range 0-1).

Target-drug associations

  • Target-drug associations are primarily harvested from the Open Targets Platform. Through the use of our own phenOncoX R package, a dedicated resource on phenotype ontology mapping in cancer, we ensure that target-drug associations included in oncoEnrichR are exclusively for cancer conditions.

Tumor suppressor/proto-oncogene/cancer driver annotation

  • We assign three confidence levels with respect to genes having oncogenic and/or tumor suppressive roles

    • Very Strong
    • Strong
      • Curated entry in NCG or CGC, with moderate support (<= 10 citations, > 5 citations) in the biomedical literature (CancerMine), OR
      • Significant support (> 10 citations, <= 20 citations) in the biomedical literature (CancerMine), no entry in CGC/NCG
    • Moderate
      • Curated entry in NCG/CGC, with no or weak support (<= 5 citations) in the biomedical literature (CancerMine), OR
      • Moderate support (<= 10 citations, > 5 citations) in the biomedical literature (CancerMine), no entry in CGC/NCG
    • Literature support for a tumor suppressive role is ignored if the gene has 2.5 times as much literature support for an oncogenic role (and vice versa)
  • Classification of genes as potential cancer drivers require support from at least two of the following sources:

Gene cancer hallmark associations

Gene copy number aberrations in tumors (somatic amplifications/homozygous deletions)

  • Somatic copy number amplifications and homozygous deletions have been retrieved from The Cancer Genome Atlas (TCGA), focusing on gene-level calls processed with their copy number variation analysis pipeline. Copy number events included in oncoEnrichR are limited to high-level amplifications (GISTIC2 CNA value: 2) and homozygous deletions (GISTIC2 CNA value: -2)

Recurrent gene mutations in tumors (somatic SNVs/InDels)

  • Recurrent somatic SNVs/InDels (occurring in more than one tumor sample) have been retrieved from The Cancer Genome Atlas (TCGA). SNVs/InDels have been annotated further with information on mutation hotspots in cancer (cancerhotspots.org), protein domains from PFAM, and their predicted status as loss-of-function variants (LOFTEE). Only protein-altering and canonical splice site mutations are included.

Gene co-expression in tumors (RNA-seq)

  • Co-expression correlation coefficients between genes in tumors have been pre-calculated, using RNAseq data available in The Cancer Genome Atlas (TCGA). We calculate separate correlation coefficients for each tumor type, and we further more limit pairs of co-expressed genes to those that include either a tumor suppressor, an oncogene, or a predicted cancer driver (using the classification outlined above). Only strong correlation coefficients (Spearman’s rank >= 0.7 or <= -0.7) are included

Molecular gene signatures / pathways

  • A comprehensive set of pathway annotations and molecular gene signatures form the basis for the functional enrichment module offered by oncoEnrichR. We include a diverse set of molecular signatures from MSigDB, annotations with respect to molecular function, biological processes and cellular components from Gene Ontology (GO), and pathway annotations from KEGG, WikiPathways, NetPath, and Reactome

Cell/tissue-specific gene expression patterns

  • We provide tissue- and cell-type specific gene expression annotations from the Human Protein Atlas. Tissue-specific patterns are derived from The Tissue Atlas, where tissue specificity is derived from deep sequencing of RNA (RNA-seq) in 50 different normal tissue types. All putative protein-coding genes have been classified with respect to abundance (tissue enriched, tissue enhanced etc.) for different tissues. For cell type specificity, we utilize single cell RNA (scRNA) sequencing datasets aggregated within the Human Protein Atlas. Here, classification of all protein-coding genes based on single cell type-specific expression in 81 different cell types are performed; determining the types of genes elevated (cell type enriched, cell type enhanced etc.) in a particular cell type compared to all other cell types.

Gene fitness effects

  • Gene fitness effects are retrieved from an integration of data in DepMap & Project Score. We include data on genes with a statistically significant effect on cell fitness in cancer cell lines (fitness score is here considered a quantitative measure of the reduction of cell viability elicited by a gene inactivation, via CRISPR/Cas9 targeting).

Subcellular compartment annotation

  • Subcellular localization/compartment annotations have been retrieved from COMPARTMENTS. When using oncoEnrichR, users have the possibility to configure the stringency with respect to the target compartment annotations retrieved:
    • Confidence score for each target compartment annotation (range from 3 to 5)
    • The type of channels that support annotations ( Knowledge, Text mining or Experimental), as well as the required minimum number of channels that should support a given target compartment annotation

Protein complexes and protein domains

Protein-protein interactions

  • Protein-protein interactions for members of the target set is explored using data from two different resources:

    1. STRING - a database of known and predicted protein-protein interactions, including direct (physical) and indirect (functional) associations. In oncoEnrichR, the publicly available STRING API is utilized for network retrieval, and the users have the possibility to configure the following:
      • A required minimum interaction score, limiting the interactions retrieved to those with an interaction score greater or equal than this threshold. Interaction scores are indicators of confidence, i.e. how likely STRING judges an interaction to be true.
      • The type of network to retrieve, either physical or functional
      • The addition of proteins (maximum 50) that are not part of the query set, but that interact most confidently with proteins in the query set
    2. BioGRID - a biomedical interaction repository with data compiled through comprehensive curation efforts
      • A required minimum number of evidence items that support each interaction (i.e. from multiple types of low or high-throughput experiments, e.g. affinity capture-MS, or affinity capture-Western). Default: 3, maximum: 10
      • The addition of proteins (maximum 50) that are not part of the query set, but that interact most confidently with proteins in the query set

Gene regulatory interactions

  • Regulatory interactions (TF-target) have been retrieved from the DoRothEA resource. Two collections of interactions are subject to overlap against the query set in oncoEnrichR: a global set of regulatory interactions, many of which are inferred from gene expression patterns in normal tissues (GTEx), and a cancer-focused set, the latter inferred from gene expression patterns in tumors (TCGA).

Prognostic gene associations

  • Prognostic associations (gene expression versus survival) are collected from The Human Pathology Atlas, which have undertaken correlation analyses of mRNA expression levels of human genes in tumor tissue and the clinical outcome (survival) for ~8,000 cancer patients (underlying data from TCGA). Correlation analyses resulted in more than 10,000 prognostic genes. We show prognostic genes with regard to cancer type and whether they are favorable or unfavorable in terms of clinical outcome. Furthermore, we show prognostic associations established by tcga-survival.com, which relates various types of gene aberrations (expression or methylation levels, CNA/mutation status) in tumor samples to patient survival.

Ligand-receptor interactions

  • Ligand-receptor interactions have been collected from the CellChatDB resource. In the oncoEnrichR module for these interactions, only interactions where both ligand and receptor are found in the query set are shown.

Synthetic lethality interactions

  • Predicted synthetic lethality interactions between human gene paralogs have been collected from De Kegel et al., Cell Systems, 2021. Interactions are shown for which both members are part of the query set, and interactions where only a single member of the query set is present.