Skip to contents

Output

Interactive HTML report

An interactive and structured quarto-generated HTML report, lists variants in known cancer predisposition genes and is provided with the following naming convention:

  • <sample_id>.cpsr.<genome_assembly>.html
    • The sample_id is provided as input by the user, and reflects a unique identifier of the sample to be analyzed.

The report is structured in multiple sections, described briefly below:

  1. Settings
    • Lists key configurations provided by user, including the list of genes that constitute the virtual gene panel in the report
  2. Summary of findings
    • Summarizes the main findings in the sample through value boxes
  3. Variant classification
    • For all coding variants in the selected cancer predisposition geneset, interactive variant tables are shown for each level of clinical significance (ClinVar and non-ClinVar (Other) variants combined):
      • Pathogenic
      • Likely Pathogenic
      • Variants of Uncertain Significance (VUS)
      • Likely Benign
      • Benign
  4. Genomic biomarkers
    • Reported clinical evidence items from CIViC that match with variants in the query set are reported in four distinct tabs (Predictive / Prognostic / Diagnostic / Predisposing)
  5. Secondary findings
  6. GWAS hits
    • Status of relatively common, low-risk variants found in genome-wide association studies of cancer phenotypes (NHGRI-EBI Catalog)
  7. Documentation
    • Introduction
      • Short overview of the CPSR variant report - aims and contents
    • Annotation resources
      • Information on annotation sources utilized by CPSR, including versions and licensing requirements
    • Variant classification
      • Overview of how CPSR performs variant classification of variants not recorded in ClinVar, listing ACMG criteria and associated scores
  8. References
    • Supporting scientific literature - knowledge resources, guideline references etc.)



Variant call format - VCF

A VCF file containing annotated, germline calls (single nucleotide variants and insertions/deletions) is generated with the following naming convention:

  • <sample_id>.cpsr.<genome_assembly>.vcf.gz (.tbi)
    • The sample_id is provided as input by the user, and reflects a unique identifier of the sample to be analyzed. Following common standards, the annotated VCF file is compressed with bgzip and indexed with tabix. Below follows a description of all annotations/tags present in the VCF INFO column after processing with the CPSR annotation pipeline:


VEP consequence annotations
Tag Description
CSQ Complete consequence annotations from VEP. Format (separated by a |): Allele, Consequence, IMPACT, SYMBOL, Gene, Feature_type, Feature, BIOTYPE, EXON, INTRON, HGVSc, HGVSp, cDNA_position, CDS_position, Protein_position, Amino_acids, Codons, Existing_variation, ALLELE_NUM, DISTANCE, STRAND, FLAGS, PICK, VARIANT_CLASS, SYMBOL_SOURCE, HGNC_ID, CANONICAL, MANE_SELECT, MANE_PLUS_CLINICAL, TSL, APPRIS, CCDS, ENSP, SWISSPROT, TREMBL, UNIPARC, RefSeq, DOMAINS, HGVS_OFFSET, gnomADe_AF, gnomADe_AFR_AF, gnomADe_AMR_AF, gnomADe_ASJ_AF, gnomADe_EAS_AF, gnomADe_FIN_AF, gnomADe_NFE_AF, gnomADe_OTH_AF, gnomADe_SAS_AF, CLIN_SIG, SOMATIC, PHENO, CHECK_REF, MOTIF_NAME, MOTIF_POS, HIGH_INF_POS, MOTIF_SCORE_CHANGE, TRANSCRIPTION_FACTORS, NearestExonJB
Consequence Impact modifier for the consequence type (picked by VEP’s --flag_pick_allele option)
Gene Ensembl stable ID of affected gene (picked by VEP’s --flag_pick_allele option)
Feature_type Type of feature. Currently one of Transcript, RegulatoryFeature, MotifFeature (picked by VEP’s --flag_pick_allele option)
Feature Ensembl stable ID of feature (picked by VEP’s --flag_pick_allele option)
cDNA_position Relative position of base pair in cDNA sequence (picked by VEP’s --flag_pick_allele option)
CDS_position Relative position of base pair in coding sequence (picked by VEP’s --flag_pick_allele option)
CDS_RELATIVE_POSITION Ratio of variant coding position to length of coding sequence
CDS_CHANGE Coding, transcript-specific sequence annotation (picked by VEP’s --flag_pick_allele option)
ALTERATION HGVSp/HGVSc identifier
AMINO_ACID_START Protein position indicating absolute start of amino acid altered (fetched from Protein_position)
AMINO_ACID_END Protein position indicating absolute end of amino acid altered (fetched from Protein_position)
Protein_position Relative position of amino acid in protein (picked by VEP’s --flag_pick_allele option)
Amino_acids Only given if the variant affects the protein-coding sequence (picked by VEP’s --flag_pick_allele option)
Codons The alternative codons with the variant base in upper case (picked by VEP’s --flag_pick_allele option)
IMPACT Impact modifier for the consequence type (picked by VEP’s --flag_pick_allele option)
VARIANT_CLASS Sequence Ontology variant class (picked by VEP’s --flag_pick_allele option)
SYMBOL Gene symbol (picked by VEP’s --flag_pick_allele option)
SYMBOL_SOURCE The source of the gene symbol (picked by VEP’s --flag_pick_allele option)
STRAND The DNA strand (1 or -1) on which the transcript/feature lies (picked by VEP’s --flag_pick_allele option)
ENSP The Ensembl protein identifier of the affected transcript (picked by VEP’s --flag_pick_allele option)
FLAGS Transcript quality flags: cds_start_NF: CDS 5’, incomplete cds_end_NF: CDS 3’ incomplete (picked by VEP’s --flag_pick_allele option)
SWISSPROT Best match UniProtKB/Swiss-Prot accession of protein product (picked by VEP’s --flag_pick_allele option)
TREMBL Best match UniProtKB/TrEMBL accession of protein product (picked by VEP’s --flag_pick_allele option)
UNIPARC Best match UniParc accession of protein product (picked by VEP’s --flag_pick_allele option)
HGVSc The HGVS coding sequence name (picked by VEP’s --flag_pick_allele option)
HGVSc_RefSeq The HGVSc coding sequence name using RefSeq transcript identifiers (MANE select) - picked by VEP’s --flag_pick_allele option)
HGVSp The HGVS protein sequence name (picked by VEP’s --flag_pick_allele option)
HGVSp_short The HGVS protein sequence name, short version (picked by VEP’s --flag_pick_allele option)
HGVS_OFFSET Indicates by how many bases the HGVS notations for this variant have been shifted (picked by VEP’s --flag_pick_allele option)
NearestExonJB VEP plugin that finds nearest exon junction for a coding sequence variant. Format: Ensembl exon identifier+distanceto exon boundary+boundary type(start/end)+exon length
MOTIF_NAME The source and identifier of a transcription factor binding profile aligned at this position (picked by VEP’s --flag_pick_allele option)
MOTIF_POS The relative position of the variation in the aligned TFBP (picked by VEP’s --flag_pick_allele option)
HIGH_INF_POS A flag indicating if the variant falls in a high information position of a transcription factor binding profile (TFBP) (picked by VEP’s --flag_pick_allele option)
MOTIF_SCORE_CHANGE The difference in motif score of the reference and variant sequences for the TFBP (picked by VEP’s --flag_pick_allele option)
CELL_TYPE List of cell types and classifications for regulatory feature (picked by VEP’s --flag_pick_allele option)
CANONICAL A flag indicating if the transcript is denoted as the canonical transcript for this gene (picked by VEP’s --flag_pick_allele option)
CCDS The CCDS identifier for this transcript, where applicable (picked by VEP’s --flag_pick_allele option)
INTRON The intron number (out of total number) (picked by VEP’s --flag_pick_allele option)
INTRON_POSITION Relative position of intron variant to nearest exon/intron junction (NearestExonJB VEP plugin)
EXON_POSITION Relative position of exon variant to nearest intron/exon junction (NearestExonJB VEP plugin)
EXON The exon number (out of total number) (picked by VEP’s --flag_pick_allele option)
EXON_AFFECTED The exon affected by the variant (picked by VEP’s --flag_pick_allele option)
LAST_EXON Logical indicator for last exon of transcript (picked by VEP’s --flag_pick_allele option)
LAST_INTRON Logical indicator for last intron of transcript (picked by VEP’s --flag_pick_allele option)
INTRON_POSITION Relative position of intron variant to nearest exon/intron junction (NearestExonJB VEP plugin)
EXON_POSITION Relative position of exon variant to nearest intron/exon junction (NearestExonJB VEP plugin)
DISTANCE Shortest distance from variant to transcript (picked by VEP’s --flag_pick_allele option)
BIOTYPE Biotype of transcript or regulatory feature (picked by VEP’s --flag_pick_allele option)
TSL Transcript support level (picked by VEP’s --flag_pick_allele option)>
PUBMED PubMed ID(s) of publications that cite existing variant - VEP
PHENO Indicates if existing variant is associated with a phenotype, disease or trait - VEP
GENE_PHENO Indicates if overlapped gene is associated with a phenotype, disease or trait - VEP
ALLELE_NUM Allele number from input; 0 is reference, 1 is first alternate etc - VEP
REFSEQ_MATCH The RefSeq transcript match status; contains a number of flags indicating whether this RefSeq transcript matches the underlying reference sequence and/or an Ensembl transcript (picked by VEP’s --flag_pick_allele option)
PICK Indicates if this block of consequence data was picked by VEP’s --flag_pick_allele option
VEP_ALL_CSQ All VEP transcript block consequences (Consequence:SYMBOL:Feature_type:Feature:BIOTYPE) - VEP
EXONIC_STATUS Indicates if variant consequence type is ‘exonic’ or ‘nonexonic’. We define ‘exonic’ as any variants with the following consequences: stop_gained / stop_lost, start_lost, frameshift_variant, missense_variant, splice_donor_variant, splice_acceptor_variant, inframe_insertion / inframe_deletion, synonymous_variant, protein_altering
CODING_STATUS Indicates if primary variant consequence type is ‘coding’ or ‘noncoding’. ‘coding’ variants are here defined as those with an ‘exonic’ status, with the exception of synonymous variants
NULL_VARIANT Primary variant consequence type is frameshift or stop_gained/stop_lost
LOSS_OF_FUNCTION Loss-of-function variant
LOF_FILTER Loss-of-function filter
SPLICE_DONOR_RELEVANT Logical indicating if variant is located at a particular location near the splice donor site (+3A/G, +4A or +5G)
BIOMARKER_MATCH Variant matches with germline biomarker evidence in CIViC/CGI. Format: <db_source>|<db_variant_id>|<db_evidence_id>:<tumor_site>:<clinical_significance>:<evidence_level>:<evidence_type><germline_somatic>|<matching_type>. Multiple evidence items are separated by ‘&’. Example: civic|174|EID445:Colon/Rectum:Sensitivity/Response:D:Predictive:Germline&EID446:Colon/Rectum:Sensitivity/Response:D:Predictive:Germline|by_gene_mut. Matching type can be any of by_genomic_coord, by_hgvsp_principal, by_hgvsc_principal, by_hgvsp_nonprincipal, by_hgvsc_nonprincipal, by_codon_principal, by_exon_mut_principal, by_gene_mut_lof, by_gene_mut
REGULATORY_ANNOTATION Comma-separated list of all variant annotations of Feature_type, RegulatoryFeature, and MotifFeature. Format (separated by a |): <Consequence>, <Feature_type>, <Feature>, <BIOTYPE>, <MOTIF_NAME>, <MOTIF_POS>, <HIGH_INF_POS>, <MOTIF_SCORE_CHANGE>, <TRANSCRIPTION_FACTORS>


Gene information
Tag Description
ENTREZGENE Entrez gene identifier
APPRIS Principal isoform flags according to the APPRIS principal isoform database
MANE_SELECT Indicating if the transcript is the MANE Select for the gene (picked by VEP’s --flag_pick_allele_gene option)
MANE_PLUS_CLINICAL Indicating if the transcript is MANE Plus Clinical, as required for clinical variant reporting (picked by VEP’s --flag_pick_allele_gene option)
UNIPROT_ID UniProt identifier
UNIPROT_ACC UniProt accession(s)
ENSEMBL_GENE_ID Ensembl gene identifier for VEP’s picked transcript (ENSGXXXXXXX)
ENSEMBL_TRANSCRIPT_ID Ensembl transcript identifier for VEP’s picked transcript (ENSTXXXXXX)
ENSEMBL_PROTEIN_ID Ensembl corresponding protein identifier for VEP’s picked transcript
REFSEQ_TRANSCRIPT_ID Corresponding RefSeq transcript(s) identifier for VEP’s picked transcript (NM_XXXXX)
REFSEQ_PROTEIN_ID RefSeq protein/peptide identifier for VEP’s picked transcript (NP_XXXXXX)
MANE_SELECT2 MANE select transcript identifer: one high-quality representative transcript per protein-coding gene that is well-supported by experimental data and represents the biology of the gene - provided through BioMart
MANE_PLUS_CLINICAL2 transcripts chosen to supplement MANE Select when needed for clinical variant reporting - provided through BioMart
GENCODE_TAG tag for GENCODE transcript (basic etc)
GENCODE_TRANSCRIPT_TYPE type of transcript (protein-coding etc.)
TSG Indicates whether gene is predicted as a tumor suppressor gene, from Network of Cancer Genes (NCG) & the CancerMine text-mining resource
TSG_SUPPORT Underlying evidence for gene being a tumor suppressor. Format: CGC_TIER<1/2>&NCG&CancerMine:num_citations"
ONCOGENE Indicates whether gene is predicted as an oncogene, from Network of Cancer Genes (NCG) & the CancerMine text-mining resource
ONCOGENE_SUPPORT Underlying evidence for gene being an oncogene. Format: CGC_TIER<1/2>&NCG&CancerMine:num_citations"
CPG_SOURCE Cancer predisposition gene source (panel 0: TCGA, CGC, PANEL_APP, OTHER)
CGC_GERMLINE Member of Cancer Gene Census - germline set
CGC_SOMATIC Member of Cancer Gene Census - somatic set
CGC_TIER Cancer Gene Census tier (1/2)
NCG_DRIVER Cancer driver gene prediction by Network of Cancer Genes (NCG)
INTOGEN_DRIVER Indicates whether gene is predicted as cancer driver from IntOGen’s cancer driver prediction algorithm
PROB_EXAC_LOF_INTOLERANT dbNSFP_gene: the probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants) based on ExAC r0.3 data
PROB_EXAC_LOF_INTOLERANT_HOM dbNSFP_gene: the probability of being intolerant of homozygous, but not heterozygous lof variants based on ExAC r0.3 data
PROB_EXAC_LOF_TOLERANT_NULL dbNSFP_gene: the probability of being tolerant of both heterozygous and homozygous lof variants based on ExAC r0.3 data
PROB_EXAC_NONTCGA_LOF_INTOLERANT dbNSFP_gene: the probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants) based on ExAC r0.3 nonTCGA subset
PROB_EXAC_NONTCGA_LOF_INTOLERANT_HOM dbNSFP_gene: the probability of being intolerant of homozygous, but not heterozygous lof variants based on ExAC r0.3 nonTCGA subset
PROB_EXAC_NONTCGA_LOF_TOLERANT_NULL dbNSFP_gene: the probability of being tolerant of both heterozygous and homozygous lof variants based on ExAC r0.3 nonTCGA subset
PROB_GNOMAD_LOF_INTOLERANT dbNSFP_gene: the probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants based on gnomAD 2.1 data
PROB_GNOMAD_LOF_INTOLERANT_HOM dbNSFP_gene: the probability of being intolerant of homozygous, but not heterozygous lof variants based on gnomAD 2.1 data
PROB_GNOMAD_LOF_TOLERANT_NULL dbNSFP_gene: the probability of being tolerant of both heterozygous and homozygous lof variants based on gnomAD 2.1 data
PROB_HAPLOINSUFFICIENCY dbNSFP_gene: Estimated probability of haploinsufficiency of the gene (from http://dx.doi.org/10.1371/journal.pgen.1001154)
ESSENTIAL_GENE_CRISPR dbNSFP_gene: Essential (E) or Non-essential phenotype-changing (N) based on large scale CRISPR experiments (from http://dx.doi.org/10.1126/science.aac7041)
ESSENTIAL_GENE_CRISPR2 dbNSFP_gene: Essential (E), context-Specific essential (S), or Non-essential phenotype-changing (N) based on large scale CRISPR experiments (from http://dx.doi.org/10.1016/j.cell.2015.11.015)


Variant effect and protein-coding information
Tag Description
MUTATION_HOTSPOT mutation hotspot codon in cancerhotspots.org. Format: GeneSymbol|Entrez_ID|CodonRefAA|Alt_AA|Q-value
MUTATION_HOTSPOT_MATCH Type of hotspot match (by_hgvsp_principal, by_hgvsc_principal, by_hgvsp_nonprincipal, by_hgvsc_nonprincipal, by_codon_principal, by_codon_nonprincipal)
MUTATION_HOTSPOT_CANCERTYPE hotspot-associated cancer types (from cancerhotspots.org)
PFAM_DOMAIN Pfam domain identifier (from VEP)
EFFECT_PREDICTIONS All predictions of effect of variant on protein function and pre-mRNA splicing from database of non-synonymous functional predictions - dbNSFP v4.2. Predicted effects are provided by different sources/algorithms (separated by &), T = Tolerated, N = Neutral, D = Damaging: 1. SIFT, 2. MutationTaster (data release Nov 2015), 3. MutationAssessor (release 3), 4. FATHMM (v2.3), 5. PROVEAN (v1.1 Jan 2015), 6. FATHMM_MKL, 7. PRIMATEAI, 8. DEOGEN2, 9. DBNSFP_CONSENSUS_RNN (Ensembl/consensus prediction, based on deep learning), 10. SPLICE_SITE_EFFECT_ADA (Ensembl/consensus prediction of splice-altering SNVs, based on adaptive boosting), 11. SPLICE_SITE_EFFECT_RF (Ensembl/consensus prediction of splice-altering SNVs, based on random forest), 12. M-CAP,
  1. MutPred, 14. GERP, 15. BayesDel, 16. LIST-S2, 17. ALoFT,
  2. AlphaMissense,
  3. ESM1b,
  4. PHACTboost,
  5. MutFormer | | DBNSFP_BAYESDEL_ADDAF | predicted effect from BayesDel (dbNSFP) | | DBNSFP_LIST_S2 | predicted effect from LIST-S2 (dbNSFP) | | DBNSFP_SIFT | predicted effect from SIFT (dbNSFP) | | DBNSFP_PROVEAN | predicted effect from PROVEAN (dbNSFP) | | DBNSFP_MUTATIONTASTER | predicted effect from MUTATIONTASTER (dbNSFP) | | DBNSFP_MUTATIONASSESSOR | predicted effect from MUTATIONASSESSOR (dbNSFP) | | DBNSFP_M_CAP | predicted effect from M-CAP (dbNSFP) | | DBNSFP_ALOFTPRED | predicted effect from ALoFT (dbNSFP) | | DBNSFP_MUTPRED | score from MUTPRED (dbNSFP) | | DBNSFP_FATHMM | predicted effect from FATHMM (dbNSFP) | | DBNSFP_PRIMATEAI | predicted effect from PRIMATEAI (dbNSFP) | | DBNSFP_DEOGEN2 | predicted effect from DEOGEN2 (dbNSFP) | | DBNSFP_PHACTBOOST | predicted effect from PHACTboost (dbNSFP) | | DBNSFP_ALPHA_MISSENSE | predicted effect from AlphaMissense (dbNSFP) | | DBNSFP_MUTFORMER | predicted effect from MutFormer (dbNSFP) | | DBNSFP_ESM1B | predicted effect from ESM1b (dbNSFP) | | DBNSFP_GERP | evolutionary constraint measure from GERP (dbNSFP) | | DBNSFP_FATHMM_MKL | predicted effect from FATHMM-mkl (dbNSFP) | | DBNSFP_META_RNN | predicted effect from ensemble prediction (deep learning - dbNSFP) | | DBNSFP_SPLICE_SITE_RF | predicted effect of splice site disruption, using random forest (dbscSNV) | | DBNSFP_SPLICE_SITE_ADA | predicted effect of splice site disruption, using boosting (dbscSNV) |


Variant allele frequencies/annotations in germline databases
Tag Description
gnomADe_AFR_AF African/American germline allele frequency (gnomAD release 2.1)
gnomADe_AMR_AF American germline allele frequency (gnomAD release 2.1)
gnomADe_AF Adjusted global germline allele frequency (gnomAD release 2.1)
gnomADe_SAS_AF South Asian germline allele frequency (gnomAD release 2.1)
gnomADe_EAS_AF East Asian germline allele frequency (gnomAD release 2.1)
gnomADe_FIN_AF Finnish germline allele frequency (gnomAD release 2.1)
gnomADe_NFE_AF Non-Finnish European germline allele frequency (gnomAD release 2.1)
gnomADe_OTH_AF Other germline allele frequency (gnomAD release 2.1)
gnomADe_ASJ_AF Ashkenazi Jewish allele frequency (gnomAD release 2.1)
gnomADe_non_cancer_ASJ_AF Alternate allele frequency for samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_EAS_AF Alternate allele frequency for samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_AFR_AF Alternate allele frequency for samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_AMR_AF Alternate allele frequency for samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_OTH_AF Alternate allele frequency for samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_NFE_AF Alternate allele frequency for samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_FIN_AF Alternate allele frequency for samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_SAS_AF Alternate allele frequency for samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_AF Alternate allele frequency in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_ASJ_AC Alternate allele count for samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_EAS_AC Alternate allele count for samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_AFR_AC Alternate allele count for samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_AMR_AC Alternate allele count for samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_OTH_AC Alternate allele count for samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_NFE_AC Alternate allele frequency for samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_FIN_AC Alternate allele count for samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_SAS_AC Alternate allele count for samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_AC Alternate allele count in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_ASJ_AN Total number of alleles in samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_EAS_AN Total number of alleles in samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_AFR_AN Total number of alleles in samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_AMR_AN Total number of alleles in samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_OTH_AN Total number of alleles in samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_NFE_AN Total number of alleles in samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_FIN_AN Total number of alleles in samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_SAS_AN Total number of alleles in samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_AN Total number of alleles in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_ASJ_NHOMALT Count of homozygous individuals in samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_EAS_NHOMALT Count of homozygous individuals in samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_AFR_NHOMALT Count of homozygous individuals in samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_AMR_NHOMALT Count of homozygous individuals in samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_OTH_NHOMALT Count of homozygous individuals in samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_NFE_NHOMALT Count of homozygous individuals in samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_FIN_NHOMALT Count of homozygous individuals in samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_SAS_NHOMALT Count of homozygous individuals in samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
gnomADe_non_cancer_NHOMALT Count of homozygous individuals in samples in the non-cancer subset (gnomAD 2.1.1)
DBSNP_RSID dbSNP reference ID, as provided by VEP


Clinical associations
Tag Description
CLINVAR_MSID ClinVar Measure Set/Variant ID
CLINVAR_ALLELE_ID ClinVar allele ID
CLINVAR_PMID Associated Pubmed IDs for variant in ClinVar - germline state-of-origin
CLINVAR_HGVSP Protein variant expression using HGVS nomenclature - ClinVar
CLINVAR_PMID_SOMATIC Associated Pubmed IDs for variant in ClinVar - somatic state-of-origin
CLINVAR_CONFLICTED ClinVar variant has conflicting interpretations
CLINVAR_CLNSIG Clinical significance for variant in ClinVar - germline state-of-origin
CLINVAR_CLASSIFICATION Clean clinical significance on a five-level scheme - ClinVar
CLINVAR_CLNSIG_SOMATIC Clinical significance for variant in ClinVar - somatic state-of-origin
CLINVAR_MEDGEN_CUI Associated MedGen concept identifiers (CUIs) - germline state-of-origin
CLINVAR_MEDGEN_CUI_SOMATIC Associated MedGen concept identifiers (CUIs) - somatic state-of-origin
CLINVAR_MOLECULAR_EFFECT Variant effect according to ClinVar annotation
CLINVAR_VARIANT_ORIGIN Origin of variant (somatic, germline, de novo etc.) for variant in ClinVar
CLINVAR_REVIEW_STATUS_STARS Rating of the ClinVar variant (0-4 stars) with respect to level of review
GWAS_HIT variant associated with cancer phenotype from genome-wide association study (NHGRI-EBI GWAS catalog)


Variant/genotype information
Tag Description
GENOTYPE Variant genotype (het/hom_ref/hom_alt)
DP_CONTROL Sequencing depth at variant site (‘DP’)



Excel workbook - XLSX

We provide an Excel workbook with four sheets that lists main findings and annotations of the predisposition analysis. The file has the following naming convention:

  • <sample_id>.cpsr.<genome_assembly>.xlsx

The Excel workbook is populated with the following sheets (pending that data is available):

  • VIRTUAL_PANEL - details on the the chosen virtual gene panel
  • CLASSIFICATION - variant classifications and corresponding gene annotations
  • BIOMARKER_EVIDENCE - matches of variants with genomic biomarkers
  • SECONDARY_FINDINGS - potential secondary findings



Tab-separated values - TSV

We provide a compressed tab-separated values file with variant classifications and the most essential variant/gene annotations. The file has the following naming convention:

  • <sample_id>.cpsr.<genome_assembly>.classification.tsv.gz

The SNVs/InDels are classified according to clinical significance (pathogenicity) (as defined above for the HTML report).

The following variables are included in the tiered TSV file (VCF tags in the query VCF potentially retained by the user will be appended):

Variable Description
1. SAMPLE_ID Sample identifier
2. GENOMIC_CHANGE Identifier for variant at the genome (VCF) level, e.g. 1:g.152382569A>G. Format: <chrom>:g.<position><ref_allele>><alt_allele>
3. VAR_ID Variant identifier - chrom_pos_ref_alt
4. GENOME_VERSION Assembly version, e.g. grch37/grch38
5. GENOTYPE Variant genotype (het/hom_ref/hom_af)
6. DP_CONTROL Sequencing depth at variant site (‘DP’)
7. CPSR_CLASSIFICATION_SOURCE ClinVar or CPSR_ACMG (the latter meaning variant not recorded in ClinVar, classified by CPSR)
8. VARIANT_CLASS Variant type, e.g. SNV/insertion/deletion
9. CODING_STATUS coding/noncoding (wrt. protein alteration and canonical splice site disruption)
10. SYMBOL Gene symbol
11. GENENAME Gene description
12. CCDS CCDS identifier
13. ENTREZGENE Entrez gene identifier
14. UNIPROT_ID UniProt protein identifier
15. ENSEMBL_GENE_ID Ensembl gene identifier
16. ENSEMBL_TRANSCRIPT_ID Ensembl transcript identifier
17. REFSEQ_TRANSCRIPT_ID RefSeq mRNA identifier
18. ONCOGENE Gene is predicted as an oncogene according to Network of Cancer Genes (NCG)/Cancer Gene Census (CGC) and CancerMine
19. TUMOR_SUPPRESSOR Gene is predicted as a tumor suppressor gene according to Network of Cancer Genes (NCG)/Cancer Gene Census (CGC) and CancerMine
20. CONSEQUENCE Variant consequence
21. ALTERATION Molecular alteration (HGVSp or HGVSc pending on consequence)
22. PROTEIN_CHANGE Protein change - one letter abbreviation (HGVSp)
23. PFAM_DOMAIN Protein domain (Pfam identifier)
24. PFAM_DOMAIN_NAME Protein domain name (Pfam)
25. HGVSp The HGVS protein sequence name
26. HGVSc The HGVS coding sequence name
27. HGVSc_RefSeq The HGVS coding sequence name (RefSeq - MANE Select)
28. CDS_CHANGE Coding, transcript-specific sequence annotation
29. LAST_EXON Last exon in gene
30. EXON Exon of variant/total number of exons in transcript (from VEP)
31. EXON_AFFECTED Transcript exon of variant (from VEP)
32. EXON_POSITION Relative position of exon variant to nearest intron/exon junction (NearestExonJB plugin)
33. INTRON_POSITION Relative position of intron variant to nearest intron/exon junction (NearestExonJB plugin)
34. VEP_ALL_CSQ All VEP transcript block consequences
35. CANCER_PHENOTYPE For variants with a ClinVar classification, indication of cancer-associated disease/phenotype (1) or not (0)
36. MUTATION_HOTSPOT Cancer mutation hotspot (cancerhotspots.org)
37. RMSK_HIT RepeatMasker hit
38. EFFECT_PREDICTIONS Functional effect predictions from multiple algorithms (dbNSFP)
39. LOSS_OF_FUNCTION Loss-of-function variant
40. LOF_FILTER Loss-of-function filter
41. NULL_VARIANT Frameshift or stop-gain variant
42. DBMTS variant with potential effect on microRNA target sites (dbMTS). Format: <ensembl_transcript_id>|<microrna_identifier>|<target_prediction_algorithms>|<gain_loss_consensus>. Target prediction algorithms indicate support by different algorithms (separated by ‘&’), TS = TargetScan, M = miRanda, R = RNAhybrid. Gain_loss_consensus indicate whether the variant was predicted to disrupt a binding site (L = Loss), or create a new target site (G = gain) by the different algorithms
43. REGULATORY_ANNOTATION Overlap of variant with regulatory elements (VEP)
44. TF_BINDING_SITE_VARIANT Indicates whether a variant overlaps a critical/non-critical position of a transcription factor binding site (TFBS) - as provided by VEP’s–regulatory option (‘Overlap: non-critical motif position’ or ‘Overlap: critical motif position’)
45. TF_BINDING_SITE_VARIANT_INFO Comma-separated list of transcription factor binding sites affected by variant. Format per factor: <TRANSCRIPTION_FACTOR>|<MOTIF_NAME>|<MOTIF_POS>|<MOTIF_SCORE_CHANGE>|<HIGH_INF_POS>. HIGH_INF_POS indicates whether the variant overlapped a critical motif position (Y), or non-critical motif position (N)
46. GERP_SCORE Genomic conservation score (GERP)
47. DBSNP_RSID dbSNP identifier (rsid)
48. CLINVAR_CLASSIFICATION clinical significance of ClinVar-recorded variant
49. CLINVAR_MSID measureset identifier of ClinVar variant
50. CLINVAR_VARIANT_ORIGIN variant origin (somatic/germline) of ClinVar variant
51. CLINVAR_CONFLICTED indicator of conflicting interpretations
52. CLINVAR_PHENOTYPE associated phenotype(s) for ClinVar variant
53. CLINVAR_REVIEW_STATUS_STARS Review confidence - number of gold stars
54. N_INSILICO_CALLED Number of algorithms with effect prediction (damaging/tolerated) from dbNSFP
55. N_INSILICO_DAMAGING Number of algorithms with damaging prediction from dbNSFP
56. N_INSILICO_TOLERATED Number of algorithms with tolerated prediction from dbNSFP
57. N_INSILICO_SPLICING_NEUTRAL Number of algorithms with splicing neutral prediction from dbscSNV
58. N_INSILICO_SPLICING_AFFECTED Number of algorithms with splicing affected prediction from dbscSNV
59. gnomADe_AF Global MAF in gnomAD (exome samples)
60. FINAL_CLASSIFICATION Final variant classification, using either CLINVAR_CLASSIFICATION if variant is ClinVar-classified, or CPSR_CLASSIFICATION for novel variants
61. CPSR_CLASSIFICATION Variant clinical significance by CPSR’s classification algorithm (P/LP/VUS/LB/B)
62. CPSR_PATHOGENICITY_SCORE Aggregated pathogenicity score by CPSR’s algorithm
63. CPSR_CLASSIFICATION_CODE Combination of CPSR classification codes assigned to the variant (ACMG)
64. CPSR_CLASSIFICATION_DOC Descriptions of CPSR classification codes assigned to the variant (ACMG)
65. <CUSTOM_POPULATION_GNOMAD> Population specific MAF in gnomAD control (non-cancer, population configured by user)

NOTE: The user has the possibility to append the TSV file with data from other INFO tags in the input VCF (i.e. using the –retained_info_tags option)



Biomarker annotations

The interactive HTML report (section Genomic biomarkers) and the Excel workbook (sheet BIOMARKER_EVIDENCE contains information on matches between potential pathogenic/likely pathogenic sample variants and reported biomarkers, the latter referring to clinical evidence items that relate genomic genomic aberrations to prognosis, diagnosis or sensitivity/resistance to particular treatments. All biomarker annotations are prefixed with BM_, and the following is provided per evidence item:

Variable Description
1. BM_CANCER_TYPE Annotated cancer type for biomarker - from CIViC
2. BM_DISEASE_ONTOLOGY_ID Disease ontology id for cancer type - from CIViC
3. BM_PRIMARY_SITE Primary tumor type of cancer type - mapped with phenOncoX
4. BM_CLINICAL_SIGNIFICANCE Clinical significance of biomarker (drug sensitivity, drug resistance, poor outcome etc.) - from CIViC
5. BM_THERAPEUTIC_CONTEXT Cancer drugs associated with biomarker (for biomarkers related to drug sensitivity/resistance) - from CIViC
6. BM_CITATION Reference/source for biomarker - i.e. publication or guidelines - from CIViC
7. BM_RATING Rating of biomarker - from CIViC
8. BM_MOLECULAR_PROFILE_NAME Associated name of molecular profile - i.e. “BRCA mutation” - from CIViC
9. BM_EVIDENCE_TYPE Biomarker type - Predictive, Diagnostic, Prognostic, Predisposing - from CIViC
10. BM_EVIDENCE_LEVEL Strength of evidence for the given biomarker - A to D - from CIViC
11. BM_EVIDENCE_DIRECTION Direction of biomarker evidence, i.e. Supports or Does Not Support - from CIViC
12. BM_EVIDENCE_DESCRIPTION Description of biomarker - from CIViC
13. BM_SOURCE_DB Biomarker source database - CIViC
14. BM_EVIDENCE_ID Evidence identifier - from CIViC
15. BM_VARIANT_ORIGIN Origin of biomarker variant - germline
16. BM_MATCH Match between sample variant and biomarker - by_genomic_coord, by_hgvsp_principal, by_gene_mut_lof etc.
17. BM_RESOLUTION Highest resolution of mapping between sample variant and biomarker - genomic, hgvsp, codon, gene