Skip to contents

Output

Interactive HTML report

An interactive and tier-structured HTML report that lists variants in known cancer predisposition genes is provided with the following naming convention:

  • <sample_id>.cpsr.<genome_assembly>.html
    • The sample_id is provided as input by the user, and reflects a unique identifier of the tumor-normal sample pair to be analyzed.

The report is structured in five main sections, described in more detail below:

  1. Settings
    • Lists key configurations provided by user, including the list of genes that constitute the virtual gene panel in the report
  2. Summary of findings
    • Summarizes the findings through donut charts
      • Number of variants in each of the five variant classification levels
  3. Variant classification
    • For all coding variants in the selected cancer predisposition geneset, interactive variant tables are shown for each level (ClinVar and non-ClinVar (Other) variants combined):
      • Pathogenic
      • Likely Pathogenic
      • Variants of Uncertain Significance (VUS)
      • Likely Benign
      • Benign
    • Biomarkers
      • Reported clinical evidence items from CIViC that overlap with variants in the query set are reported in four distinct tabs (Predictive / Prognostic / Diagnostic / Predisposing)
    • Secondary findings
    • GWAS hits
      • Low-risk variants found in genome-wide association studies of cancer phenotypes (NHGRI-EBI Catalog)
  4. Documentation
    • Introduction
      • Short overview of the predisposition report - aims and contents
    • Annotation resources
      • Underlying tools, databases and annotation sources (with versions)
    • Variant classification
      • Overview of how CPSR performs variant classification of variants not recorded in ClinVar, listing ACMG criteria and associated scores
    • References
      • Supporting scientific literature (Interpretation/implementation of ACMG critera etc.)

Interactive datatables

The interactive datatables contain a number of hyperlinked annotations similar to those defined for the annotated VCF file, including the following:

Annotation Description
SYMBOL Gene symbol (Entrez/NCBI)
PROTEIN_CHANGE Amino acid change (VEP)
GENE_NAME gene name description (Entrez/NCBI)
PROTEIN_DOMAIN PFAM protein domain
PROTEIN_FEATURE UniProt feature overlapping variant site
CDS_CHANGE Coding sequence change
CONSEQUENCE VEP consequence (primary transcript)
LOSS_OF_FUNCTION Predicted loss-of-function variant
RMSK_HIT Overlap with repeats as annotated by RepeatMasker
HGVSc from VEP
HGVSp from VEP
NCBI_REFSEQ Transcript accession ID(s) (NCBI RefSeq)
ONCOGENE Predicted proto-oncogene (CancerMine/NCG)
TUMOR_SUPPRESSOR known tumor suppressor gene (CancerMine/NCG)
PREDICTED_EFFECT Effect predictions (deleterious/benign) from dbNSFP
VEP_ALL_CSQ All VEP transcript block consequences
DBSNP dbSNP rsID
GENOMIC_CHANGE Variant ID
GENOME_VERSION Genome assembly

JSON

A JSON file (gzipped) that stores the HTML report content is provided. This file will easen the process of extracting particular parts of the report for further analysis.

The JSON contains two main objects, metadata and content, where the former contains information about the settings, data versions, and the latter contains the various sections of the report.

The JSON file can be used as input to PCGR, in order to populate a somatic genome report with germline findings.

At present, there is no detailed schema documented for the JSON structure.

Variant call format - VCF

A VCF file containing annotated, germline calls (single nucleotide variants and insertions/deletions) is generated with the following naming convention:

  • <sample_id>.cpsr.<genome_assembly>.vcf.gz (.tbi)
    • The sample_id is provided as input by the user, and reflects a unique identifier of the tumor-normal sample pair to be analyzed. Following common standards, the annotated VCF file is compressed with bgzip and indexed with tabix. Below follows a description of all annotations/tags present in the VCF INFO column after processing with the CPSR annotation pipeline:


VEP consequence annotations
Tag Description
CSQ Complete consequence annotations from VEP. Format (separated by a |): Allele, Consequence, IMPACT, SYMBOL, Gene, Feature_type, Feature, BIOTYPE, EXON, INTRON, HGVSc, HGVSp, cDNA_position, CDS_position, Protein_position, Amino_acids, Codons, Existing_variation, ALLELE_NUM, DISTANCE, STRAND, FLAGS, PICK, VARIANT_CLASS, SYMBOL_SOURCE, HGNC_ID, CANONICAL, MANE, TSL, APPRIS, CCDS, ENSP, SWISSPROT, TREMBL, UNIPARC, RefSeq, DOMAINS, HGVS_OFFSET, AF, AFR_AF, AMR_AF, EAS_AF, EUR_AF, SAS_AF, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_ASJ_AF, gnomAD_EAS_AF, gnomAD_FIN_AF, gnomAD_NFE_AF, gnomAD_OTH_AF, gnomAD_SAS_AF, CLIN_SIG, SOMATIC, PHENO, CHECK_REF, MOTIF_NAME, MOTIF_POS, HIGH_INF_POS, MOTIF_SCORE_CHANGE, TRANSCRIPTION_FACTORS, NearestExonJB
Consequence Impact modifier for the consequence type (picked by VEP’s --flag_pick_allele option)
Gene Ensembl stable ID of affected gene (picked by VEP’s --flag_pick_allele option)
Feature_type Type of feature. Currently one of Transcript, RegulatoryFeature, MotifFeature (picked by VEP’s --flag_pick_allele option)
Feature Ensembl stable ID of feature (picked by VEP’s --flag_pick_allele option)
cDNA_position Relative position of base pair in cDNA sequence (picked by VEP’s --flag_pick_allele option)
CDS_position Relative position of base pair in coding sequence (picked by VEP’s --flag_pick_allele option)
CDS_CHANGE Coding, transcript-specific sequence annotation (picked by VEP’s --flag_pick_allele option)
AMINO_ACID_START Protein position indicating absolute start of amino acid altered (fetched from Protein_position)
AMINO_ACID_END Protein position indicating absolute end of amino acid altered (fetched from Protein_position)
Protein_position Relative position of amino acid in protein (picked by VEP’s --flag_pick_allele option)
Amino_acids Only given if the variant affects the protein-coding sequence (picked by VEP’s --flag_pick_allele option)
Codons The alternative codons with the variant base in upper case (picked by VEP’s --flag_pick_allele option)
IMPACT Impact modifier for the consequence type (picked by VEP’s --flag_pick_allele option)
VARIANT_CLASS Sequence Ontology variant class (picked by VEP’s --flag_pick_allele option)
SYMBOL Gene symbol (picked by VEP’s --flag_pick_allele option)
SYMBOL_ENTREZ Official gene symbol as provided by NCBI’s Entrez gene
SYMBOL_SOURCE The source of the gene symbol (picked by VEP’s --flag_pick_allele option)
STRAND The DNA strand (1 or -1) on which the transcript/feature lies (picked by VEP’s --flag_pick_allele option)
ENSP The Ensembl protein identifier of the affected transcript (picked by VEP’s --flag_pick_allele option)
FLAGS Transcript quality flags: cds_start_NF: CDS 5’, incomplete cds_end_NF: CDS 3’ incomplete (picked by VEP’s --flag_pick_allele option)
SWISSPROT Best match UniProtKB/Swiss-Prot accession of protein product (picked by VEP’s --flag_pick_allele option)
TREMBL Best match UniProtKB/TrEMBL accession of protein product (picked by VEP’s --flag_pick_allele option)
UNIPARC Best match UniParc accession of protein product (picked by VEP’s --flag_pick_allele option)
HGVSc The HGVS coding sequence name (picked by VEP’s --flag_pick_allele option)
HGVSp The HGVS protein sequence name (picked by VEP’s --flag_pick_allele option)
HGVSp_short The HGVS protein sequence name, short version (picked by VEP’s --flag_pick_allele option)
HGVS_OFFSET Indicates by how many bases the HGVS notations for this variant have been shifted (picked by VEP’s --flag_pick_allele option)
NearestExonJB VEP plugin that finds nearest exon junction for a coding sequence variant. Format: Ensembl exon identifier+distanceto exon boundary+boundary type(start/end)+exon length
MOTIF_NAME The source and identifier of a transcription factor binding profile aligned at this position (picked by VEP’s --flag_pick_allele option)
MOTIF_POS The relative position of the variation in the aligned TFBP (picked by VEP’s --flag_pick_allele option)
HIGH_INF_POS A flag indicating if the variant falls in a high information position of a transcription factor binding profile (TFBP) (picked by VEP’s --flag_pick_allele option)
MOTIF_SCORE_CHANGE The difference in motif score of the reference and variant sequences for the TFBP (picked by VEP’s --flag_pick_allele option)
CELL_TYPE List of cell types and classifications for regulatory feature (picked by VEP’s --flag_pick_allele option)
CANONICAL A flag indicating if the transcript is denoted as the canonical transcript for this gene (picked by VEP’s --flag_pick_allele option)
CCDS The CCDS identifier for this transcript, where applicable (picked by VEP’s --flag_pick_allele option)
INTRON The intron number (out of total number) (picked by VEP’s --flag_pick_allele option)
INTRON_POSITION Relative position of intron variant to nearest exon/intron junction (NearestExonJB VEP plugin)
EXON_POSITION Relative position of exon variant to nearest intron/exon junction (NearestExonJB VEP plugin)
EXON The exon number (out of total number) (picked by VEP’s --flag_pick_allele option)
LAST_EXON Logical indicator for last exon of transcript (picked by VEP’s --flag_pick_allele option)
LAST_INTRON Logical indicator for last intron of transcript (picked by VEP’s --flag_pick_allele option)
DISTANCE Shortest distance from variant to transcript (picked by VEP’s --flag_pick_allele option)
BIOTYPE Biotype of transcript or regulatory feature (picked by VEP’s --flag_pick_allele option)
TSL Transcript support level (picked by VEP’s --flag_pick_allele option)>
PUBMED PubMed ID(s) of publications that cite existing variant - VEP
PHENO Indicates if existing variant is associated with a phenotype, disease or trait - VEP
GENE_PHENO Indicates if overlapped gene is associated with a phenotype, disease or trait - VEP
ALLELE_NUM Allele number from input; 0 is reference, 1 is first alternate etc - VEP
REFSEQ_MATCH The RefSeq transcript match status; contains a number of flags indicating whether this RefSeq transcript matches the underlying reference sequence and/or an Ensembl transcript (picked by VEP’s --flag_pick_allele option)
PICK Indicates if this block of consequence data was picked by VEP’s --flag_pick_allele option
VEP_ALL_CSQ All VEP transcript block consequences (Consequence:SYMBOL:Feature_type:Feature:BIOTYPE) - VEP
EXONIC_STATUS Indicates if variant consequence type is ‘exonic’ or ‘nonexonic’. We define ‘exonic’ as any variants with the following consequences: stop_gained / stop_lost, start_lost, frameshift_variant, missense_variant, splice_donor_variant, splice_acceptor_variant, inframe_insertion / inframe_deletion, synonymous_variant, protein_altering
CODING_STATUS Indicates if primary variant consequence type is ‘coding’ or ‘noncoding’. ‘coding’ variants are here defined as those with an ‘exonic’ status, with the exception of synonymous variants
NULL_VARIANT Primary variant consequence type is frameshift or stop_gained/stop_lost
SPLICE_DONOR_RELEVANT Logical indicating if variant is located at a particular location near the splice donor site (+3A/G, +4A or +5G)
REGULATORY_ANNOTATION Comma-separated list of all variant annotations of Feature_type, RegulatoryFeature, and MotifFeature. Format (separated by a |): <Consequence>, <Feature_type>, <Feature>, <BIOTYPE>, <MOTIF_NAME>, <MOTIF_POS>, <HIGH_INF_POS>, <MOTIF_SCORE_CHANGE>, <TRANSCRIPTION_FACTORS>


Gene information
Tag Description
ENTREZ_ID Entrez gene identifier
APPRIS Principal isoform flags according to the APPRIS principal isoform database
MANE_SELECT Indicating if the transcript is the MANE Select or MANE Plus Clinical transcript for the gene (picked by VEP’s --flag_pick_allele_gene option)
UNIPROT_ID UniProt identifier
UNIPROT_ACC UniProt accession(s)
ENSEMBL_GENE_ID Ensembl gene identifier for VEP’s picked transcript (ENSGXXXXXXX)
ENSEMBL_TRANSCRIPT_ID Ensembl transcript identifier for VEP’s picked transcript (ENSTXXXXXX)
ENSEMBL_PROTEIN_ID Ensembl corresponding protein identifier for VEP’s picked transcript
REFSEQ_MRNA Corresponding RefSeq transcript(s) identifier for VEP’s picked transcript (NM_XXXXX)
TRANSCRIPT_MANE_SELECT MANE select transcript identifer: one high-quality representative transcript per protein-coding gene that is well-supported by experimental data and represents the biology of the gene
TRANSCRIPT_MANE_PLUS_CLINICAL transcripts chosen to supplement MANE Select when needed for clinical variant reporting
GENCODE_TAG tag for gencode transcript (basic etc)
GENCODE_TRANSCRIPT_TYPE type of transcript (protein-coding etc.)
CORUM_ID Associated protein complexes (identifiers) from CORUM
TUMOR_SUPPRESSOR Indicates whether gene is predicted as a tumor suppressor gene, from Network of Cancer Genes (NCG) & the CancerMine text-mining resource
TUMOR_SUPPRESSOR_EVIDENCE Underlying evidence for gene being a tumor suppressor. Format: NCG:<TRUE|FALSE>&CancerMine:<LC|MC|HC>:num_citations
ONCOGENE Indicates whether gene is predicted as an oncogene, from Network of Cancer Genes (NCG) & the CancerMine text-mining resource
ONCOGENE_EVIDENCE Underlying evidence for gene being an oncogene. Format: NCG:<TRUE|FALSE>&CancerMine:<LC|MC|HC>:num_citations
CANCER_SUSCEPTIBILITY_CUI MedGen concept unique identifier (CUI) for cancer phenotype
CANCER_SYNDROME_CUI MedGen concept unique identifier (CUI) for cancer syndrome
CANCER_PREDISPOSITION_SOURCE Data source for susceptibility gene (panel 0: NCGC, CGC_94, TCGA_PANCAN, PANEL_APP, OTHER)
CANCER_PREDISPOSITION_MOI Mode of inheritance for susceptibility gene (AR/AD)
CANCER_PREDISPOSITION_MOD Mechanism of disease for susceptibility gene (Lof/GoF)
PROB_EXAC_LOF_INTOLERANT dbNSFP_gene: the probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants) based on ExAC r0.3 data
PROB_EXAC_LOF_INTOLERANT_HOM dbNSFP_gene: the probability of being intolerant of homozygous, but not heterozygous lof variants based on ExAC r0.3 data
PROB_EXAC_LOF_TOLERANT_NULL dbNSFP_gene: the probability of being tolerant of both heterozygous and homozygous lof variants based on ExAC r0.3 data
PROB_EXAC_NONTCGA_LOF_INTOLERANT dbNSFP_gene: the probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants) based on ExAC r0.3 nonTCGA subset
PROB_EXAC_NONTCGA_LOF_INTOLERANT_HOM dbNSFP_gene: the probability of being intolerant of homozygous, but not heterozygous lof variants based on ExAC r0.3 nonTCGA subset
PROB_EXAC_NONTCGA_LOF_TOLERANT_NULL dbNSFP_gene: the probability of being tolerant of both heterozygous and homozygous lof variants based on ExAC r0.3 nonTCGA subset
PROB_GNOMAD_LOF_INTOLERANT dbNSFP_gene: the probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants based on gnomAD 2.1 data
PROB_GNOMAD_LOF_INTOLERANT_HOM dbNSFP_gene: the probability of being intolerant of homozygous, but not heterozygous lof variants based on gnomAD 2.1 data
PROB_GNOMAD_LOF_TOLERANT_NULL dbNSFP_gene: the probability of being tolerant of both heterozygous and homozygous lof variants based on gnomAD 2.1 data
PROB_HAPLOINSUFFICIENCY dbNSFP_gene: Estimated probability of haploinsufficiency of the gene (from http://dx.doi.org/10.1371/journal.pgen.1001154)
ESSENTIAL_GENE_CRISPR dbNSFP_gene: Essential (E) or Non-essential phenotype-changing (N) based on large scale CRISPR experiments (from http://dx.doi.org/10.1126/science.aac7041)
ESSENTIAL_GENE_CRISPR2 dbNSFP_gene: Essential (E), context-Specific essential (S), or Non-essential phenotype-changing (N) based on large scale CRISPR experiments (from http://dx.doi.org/10.1016/j.cell.2015.11.015)


Variant effect and protein-coding information
Tag Description
MUTATION_HOTSPOT mutation hotspot codon in cancerhotspots.org. Format: gene_symbol | codon | q-value
MUTATION_HOTSPOT_TRANSCRIPT hotspot-associated transcripts (Ensembl transcript ID)
MUTATION_HOTSPOT_CANCERTYPE hotspot-associated cancer types (from cancerhotspots.org)
UNIPROT_FEATURE Overlapping protein annotations from UniProt KB
PFAM_DOMAIN Pfam domain identifier (from VEP)
EFFECT_PREDICTIONS All predictions of effect of variant on protein function and pre-mRNA splicing from database of non-synonymous functional predictions - dbNSFP v4.2. Predicted effects are provided by different sources/algorithms (separated by &), T = Tolerated, N = Neutral, D = Damaging: 1.SIFT, 2.MutationTaster (data release Nov 2015), 3.MutationAssessor (release 3), 4.FATHMM (v2.3), 5.PROVEAN (v1.1 Jan 2015), 6.FATHMM_MKL, 7.PRIMATEAI, 8.DEOGEN2, 9.DBNSFP_CONSENSUS_RNN (Ensembl/consensus prediction, based on deep learning), 10.SPLICE_SITE_EFFECT_ADA (Ensembl/consensus prediction of splice-altering SNVs, based on adaptive boosting), 11.SPLICE_SITE_EFFECT_RF (Ensembl/consensus prediction of splice-altering SNVs, based on random forest), 12.M-CAP, 13.MutPred, 14.GERP, 15.BayesDel, 16.LIST-S2, 17.ALoFT
DBNSFP_BAYESDEL_ADDAF predicted effect from BayesDel (dbNSFP)
DBNSFP_LIST_S2 predicted effect from LIST-S2 (dbNSFP)
DBNSFP_SIFT predicted effect from SIFT (dbNSFP)
DBNSFP_PROVEAN predicted effect from PROVEAN (dbNSFP)
DBNSFP_MUTATIONTASTER predicted effect from MUTATIONTASTER (dbNSFP)
DBNSFP_MUTATIONASSESSOR predicted effect from MUTATIONASSESSOR (dbNSFP)
DBNSFP_M_CAP predicted effect from M-CAP (dbNSFP)
DBNSFP_ALOFTPRED predicted effect from ALoFT (dbNSFP)
DBNSFP_MUTPRED score from MUTPRED (dbNSFP)
DBNSFP_FATHMM predicted effect from FATHMM (dbNSFP)
DBNSFP_PRIMATEAI predicted effect from PRIMATEAI (dbNSFP)
DBNSFP_DEOGEN2 predicted effect from DEOGEN2 (dbNSFP)
DBNSFP_GERP evolutionary constraint measure from GERP (dbNSFP)
DBNSFP_FATHMM_MKL predicted effect from FATHMM-mkl (dbNSFP)
DBNSFP_META_RNN predicted effect from ensemble prediction (deep learning - dbNSFP)
DBNSFP_SPLICE_SITE_RF predicted effect of splice site disruption, using random forest (dbscSNV)
DBNSFP_SPLICE_SITE_ADA predicted effect of splice site disruption, using boosting (dbscSNV)


Variant frequencies/annotations in germline databases
Tag Description
AFR_AF_GNOMAD African/American germline allele frequency (gnomAD release 2.1)
AMR_AF_GNOMAD American germline allele frequency (gnomAD release 2.1)
GLOBAL_AF_GNOMAD Adjusted global germline allele frequency (gnomAD release 2.1)
SAS_AF_GNOMAD South Asian germline allele frequency (gnomAD release 2.1)
EAS_AF_GNOMAD East Asian germline allele frequency (gnomAD release 2.1)
FIN_AF_GNOMAD Finnish germline allele frequency (gnomAD release 2.1)
NFE_AF_GNOMAD Non-Finnish European germline allele frequency (gnomAD release 2.1)
OTH_AF_GNOMAD Other germline allele frequency (gnomAD release 2.1)
ASJ_AF_GNOMAD Ashkenazi Jewish allele frequency (gnomAD release 2.1)
NON_CANCER_AF_ASJ Alternate allele frequency for samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AF_EAS Alternate allele frequency for samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AF_AFR Alternate allele frequency for samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AF_AMR Alternate allele frequency for samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AF_OTH Alternate allele frequency for samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AF_NFE Alternate allele frequency for samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AF_FIN Alternate allele frequency for samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AF_SAS Alternate allele frequency for samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AF_GLOBAL Alternate allele frequency in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AC_ASJ Alternate allele count for samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AC_EAS Alternate allele count for samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AC_AFR Alternate allele count for samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AC_AMR Alternate allele count for samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AC_OTH Alternate allele count for samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AC_NFE Alternate allele frequency for samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AC_FIN Alternate allele count for samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AC_SAS Alternate allele count for samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AC_GLOBAL Alternate allele count in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AN_ASJ Total number of alleles in samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AN_EAS Total number of alleles in samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AN_AFR Total number of alleles in samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AN_AMR Total number of alleles in samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AN_OTH Total number of alleles in samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AN_NFE Total number of alleles in samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AN_FIN Total number of alleles in samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AN_SAS Total number of alleles in samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_AN_GLOBAL Total number of alleles in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_NHOMALT_ASJ Count of homozygous individuals in samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_NHOMALT_EAS Count of homozygous individuals in samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_NHOMALT_AFR Count of homozygous individuals in samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_NHOMALT_AMR Count of homozygous individuals in samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_NHOMALT_OTH Count of homozygous individuals in samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_NHOMALT_NFE Count of homozygous individuals in samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_NHOMALT_FIN Count of homozygous individuals in samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_NHOMALT_SAS Count of homozygous individuals in samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
NON_CANCER_NHOMALT_GLOBAL Count of homozygous individuals in samples in the non-cancer subset (gnomAD 2.1.1)
AFR_AF_1KG 1000G Project - phase 3 germline allele frequency for samples from AFR (African)
AMR_AF_1KG 1000G Project - phase 3 germline allele frequency for samples from AMR (Ad Mixed American)
EAS_AF_1KG 1000G Project - phase 3 germline allele frequency for samples from EAS (East Asian)
EUR_AF_1KG 1000G Project - phase 3 germline allele frequency for samples from EUR (European)
SAS_AF_1KG 1000G Project - phase 3 germline allele frequency for samples from SAS (South Asian)
GLOBAL_AF_1KG 1000G Project - phase 3 germline allele frequency for all 1000G project samples (global)
DBSNPRSID dbSNP reference ID, as provided by VEP


Clinical associations
Tag Description
CLINVAR_MSID ClinVar Measure Set/Variant ID
CLINVAR_ALLELE_ID ClinVar allele ID
CLINVAR_PMID Associated Pubmed IDs for variant in ClinVar - germline state-of-origin
CLINVAR_HGVSP Protein variant expression using HGVS nomenclature
CLINVAR_PMID_SOMATIC Associated Pubmed IDs for variant in ClinVar - somatic state-of-origin
CLINVAR_CONFLICTED Variant has conflicting interpretations
CLINVAR_CLNSIG Clinical significance for variant in ClinVar - germline state-of-origin
CLINVAR_CLASSIFICATION Clean clinical significance on a five-level scheme
CLINVAR_CLNSIG_SOMATIC Clinical significance for variant in ClinVar - somatic state-of-origin
CLINVAR_MEDGEN_CUI Associated MedGen concept identifiers (CUIs) - germline state-of-origin
CLINVAR_MEDGEN_CUI_SOMATIC Associated MedGen concept identifiers (CUIs) - somatic state-of-origin
CLINVAR_MOLECULAR_EFFECT Variant effect according to ClinVar annotation
CLINVAR_VARIANT_ORIGIN Origin of variant (somatic, germline, de novo etc.) for variant in ClinVar
CLINVAR_REVIEW_STATUS_STARS Rating of the ClinVar variant (0-4 stars) with respect to level of review
GWAS_HIT variant associated with cancer phenotype from genome-wide association study (NHGRI-EBI GWAS catalog)
OPENTARGETS_DISEASE_ASSOCS Associations between protein targets and disease based on multiple lines of evidence (mutations,affected pathways,GWAS, literature etc). Format: CUI:EFO_ID:IS_DIRECT:OVERALL_SCORE
OPENTARGETS_TRACTABILITY_COMPOUND Confidence for the existence of a modulator (small molecule) that interacts with the target to elicit a desired biological effect
OPENTARGETS_TRACTABILITY_ANTIBODY Confidence for the existence of a modulator (antibody) that interacts with the target to elicit a desired biological effect

Tab-separated values (TSV)

We provide a tab-separated values file with most important variant/gene annotations. The file has the following naming convention:

  • <sample_id>.cpsr.<genome_assembly>.snvs_indels.tiers.tsv

The SNVs/InDels are organized into different tiers (as defined above for the HTML report).

The following variables are included in the tiered TSV file (VCF tags issued by the user will be appended at the end):

Variable Description
1. GENOMIC_CHANGE Identifier for variant at the genome (VCF) level, e.g. 1:g.152382569A>G. Format: <chrom>:g.<position><ref_allele>><alt_allele>
2. VAR_ID Variant identifier
3. GENOTYPE Variant genotype (heterozygous/homozygous)
4. CPSR_CLASSIFICATION_SOURCE ClinVar or Other (i.e. not present in ClinVar)
5. GENOME_VERSION Assembly version, e.g. GRCh37
6. VCF_SAMPLE_ID Sample identifier
7. VARIANT_CLASS Variant type, e.g. SNV/insertion/deletion
8. CODING_STATUS coding/noncoding (wrt. protein alteration and canonical splice site disruption)
9. SYMBOL Gene symbol
10. GENE_NAME Gene description
11. CCDS CCDS identifier
12. ENTREZ_ID Entrez gene identifier
13. UNIPROT_ID UniProt protein identifier
14. ENSEMBL_GENE_ID Ensembl gene identifier
15. ENSEMBL_TRANSCRIPT_ID Ensembl transcript identifier
16. REFSEQ_MRNA RefSeq mRNA identifier
17. ONCOGENE Gene is predicted as an oncogene according to Network of Cancer Genes (NCG) and CancerMine
18. TUMOR_SUPPRESSOR Gene is predicted as a tumor suppressor gene according to Network of Cancer Genes (NCG) and CancerMine
19. CONSEQUENCE Variant consequence
20. VEP_ALL_CSQ All VEP transcript block consequences
21. PROTEIN_CHANGE Protein change - one letter abbreviation (HGVSp)
22. PROTEIN_DOMAIN Protein domain (Pfam)
23. DBSNP dbSNP identifier (rsid)
24. HGVSp The HGVS protein sequence name
25. HGVSc The HGVS coding sequence name
26. LAST_EXON Last exon in gene
27. EXON_POSITION Relative position of exon variant to nearest intron/exon junction (NearestExonJB plugin)
28. INTRON_POSITION Relative position of intron variant to nearest intron/exon junction (NearestExonJB plugin)
29. CDS_CHANGE Coding, transcript-specific sequence annotation
30. MUTATION_HOTSPOT Cancer mutation hotspot (cancerhotspots.org)
31. RMSK_HIT RepeatMasker hit
32. PROTEIN_FEATURE Protein feature (active sites etc.) from UniProt KnowledgeBase
33. EFFECT_PREDICTIONS Functional effect predictions from multiple algorithms (dbNSFP)
34. LOSS_OF_FUNCTION Loss-of-function variant, as predicted from VEP’s LofTee plugin
35. CANCER_PHENOTYPE For variants with a ClinVar classification, indication of cancer-associated disease/phenotype (1) or not (0)
36. CLINVAR_CLASSIFICATION clinical significance of ClinVar Variant (CPSR category)
37. CLINVAR_MSID measureset identifier of ClinVar variant
38. CLINVAR_VARIANT_ORIGIN variant origin (somatic/germline) of ClinVar variant
39. CLINVAR_CONFLICTED indicator of conflicting interpretations
40. CLINVAR_PHENOTYPE associated phenotype(s) for ClinVar variant
41. CLINVAR_REVIEW_STATUS_STARS
42. DBMTS variant with potential effect on microRNA target sites (dbMTS). Format: <ensembl_transcript_id>|<microrna_identifier>|<target_prediction_algorithms>|<gain_loss_consensus>. Target prediction algorithms indicate support by different algorithms (separated by ‘&’), TS = TargetScan, M = miRanda, R = RNAhybrid. Gain_loss_consensus indicate whether the variant was predicted to disrupt a binding site (L = Loss), or create a new target site (G = gain) by the different algorithms
43. miRNA_TARGET_HIT loss, gain, or gain
44. miRNA_TARGET_HIT_PREDICTION links to miRBase, as given from the hits in the DBMTS column
45. TF_BINDING_SITE_VARIANT Indicates whether a variant overlaps a critical/non-critical position of a transcription factor binding site (TFBS) - as provided by VEP’s–regulatory option (‘Overlap: non-critical motif position’ or ‘Overlap: critical motif position’)
46. TF_BINDING_SITE_VARIANT_INFO Comma-separated list of transcription factor binding sites affected by variant. Format per factor: <TRANSCRIPTION_FACTOR>|<MOTIF_NAME>|<MOTIF_POS>|<MOTIF_SCORE_CHANGE>|<HIGH_INF_POS>. HIGH_INF_POS indicates whether the variant overlapped a critical motif position (Y), or non-critical motif position (N)
47. GERP_SCORE Genomic conservation score (GERP)
48. N_INSILICO_CALLED Number of algorithms with effect prediction (damaging/tolerated) from dbNSFP
49. N_INSILICO_DAMAGING Number of algorithms with damaging prediction from dbNSFP
50. N_INSILICO_TOLERATED Number of algorithms with tolerated prediction from dbNSFP
51. N_INSILICO_SPLICING_NEUTRAL Number of algorithms with splicing neutral prediction from dbscSNV
52. N_INSILICO_SPLICING_AFFECTED Number of algorithms with splicing affected prediction from dbscSNV
53. GLOBAL_AF_GNOMAD Global MAF in gnomAD
54. <CUSTOM_POPULATION_GNOMAD> Population specific MAF in gnomAD control (non-cancer, population configured by user)
55. ACMG_BA1_AD Very high MAF (> 0.5% in gnomAD non-cancer pop subset) - min AN = 12,000 - Dominant mechanism of disease
56. ACMG_BS1_1_AD High MAF (> 0.1% in gnomAD non-cancer pop subset) - min AN = 12,000 - Dominant mechanism of disease
57. ACMG_BS1_2_AD Somewhat high MAF (> 0.005% in gnomAD non-cancer pop subset) - Dominant mechanism of disease
58. ACMG_BA1_AR Very high MAF (> 1% in gnomAD non-cancer pop subset) - min AN = 12,000 - Recessive mechanism of disease
59. ACMG_BS1_1_AR High MAF (> 0.3% in gnomAD non-cancer pop subset) - min AN = 12,000 - Recessive mechanism of disease
60. ACMG_BS1_2_AR Somewhat high MAF (> 0.005% in gnomAD non-cancer pop subset) - Recessive mechanism of disease
61. ACMG_PM2_1 Allele count within pathogenic range (MAF <= 0.005% in the population-specific non-cancer gnomAD subset)
62. ACMG_PM2_2 Alternate allele absent in the population-specific non-cancer gnomAD subset
63. ACMG_PVS1_1 Null variant (frameshift/nonsense) - predicted as LoF by LOFTEE - within pathogenic range - LoF established for gene
64. ACMG_PVS1_2 Null variant (frameshift/nonsense) - not predicted as LoF by LOFTEE - within pathogenic range - LoF established for gene
65. ACMG_PVS1_3 Null variant (frameshift/nonsense) - predicted as LoF by LOFTEE - within pathogenic range - LoF not established for gene
66. ACMG_PVS1_4 Null variant (frameshift/nonsense) - not predicted as LoF by LOFTEE – within pathogenic range - LoF not established for gene
67. ACMG_PVS1_5 Start (initiator methionine) lost - within pathogenic range - Lof established for gene
68. ACMG_PVS1_6 Start (initiator methionine) lost - within pathogenic range - LoF not established for gene
69. ACMG_PVS1_7 Donor/acceptor variant - predicted as LoF by LOFTEE - within pathogenic range - not last intron - LoF established for gene
70. ACMG_PVS1_8 Donor/acceptor variant - last intron - within pathogenic range - LoF established for gene
71. ACMG_PVS1_9 Donor/acceptor variant - not last intron - within pathogenic range - LoF not established for gene
72. ACMG_PVS1_10 Donor variant at located at the +3, +4 or +5 position of the intron - within the pathogenic range (i.e. <9 alleles in ExAC))
73. ACMG_PS1 Same amino acid change as a previously established pathogenic variant (ClinVar) regardless of nucleotide change
74. ACMG_PP2 Missense variant in a gene that has a relatively low rate of benign missense variation (<20%) and where missense variants are a common mechanism of disease (>50% P/LP (ClinVar))
75. ACMG_PM1 Missense variant in a somatic mutation hotspot as determined by cancerhotspots.org
76. ACMG_PM4 Protein length changes due to inframe indels or nonstop variant in non-repetitive regions of genes that harbor variants with a dominant mode of inheritance.
77. ACMG_PPC1 Protein length changes due to inframe indels or nonstop variant in non-repetitive regions of genes that harbor variants with a recessive mode of inheritance.
78. ACMG_PM5 Novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before (ClinVar)
79. ACMG_PP3 Multiple lines (>=5) of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact) with maximum two contradictory predictions - from dbNSFP
80. ACMG_BP4 Multiple lines (>=5) of computational evidence support a benign effect on the gene or gene product (conservation, evolutionary, splicing impact) with maximum two contradictory prediction - from dbNSFP
81. ACMG_BMC1 Peptide change is at the same location of a known benign change (ClinVar)
82. ACMG_BSC1 Peptide change is reported as benign (ClinVar)
83. ACMG_BP1 Missense variant in a gene for which primarily truncating variants are known to cause disease (ClinVar)
84. ACMG_BP3 Variants in promoter or untranslated regions
85. ACMG_BP7 Silent/intronic variant outside of the splice site consensus
86. FINAL_CLASSIFICATION Final variant classification based on the combination of CLINVAR_CLASSIFICTION (for ClinVar-classified variants), and CPSR_CLASSIFICATION (for novel variants)
87. CPSR_CLASSIFICATION CPSR tier level (P/LP/VUS/LB/B)
88. CPSR_PATHOGENICITY_SCORE Aggregated CPSR pathogenicity score
89. CPSR_CLASSIFICATION_CODE Combination of CPSR classification codes assigned to the variant (ACMG)
90. CPSR_CLASSIFICATION_DOC Verbal description of CPSR classification codes assignted to the variant (ACMG)

NOTE: The user has the possibility to append the TSV file with data from other INFO tags in the input VCF (i.e. using the –preserved_info_tags option)