Output files • CPSR

Output

Interactive HTML report

An interactive and structured quarto-generated HTML report, lists variants in known cancer predisposition genes and is provided with the following naming convention:

<sample_id>.cpsr.<genome_assembly>.html
- The sample_id is provided as input by the user, and reflects a unique identifier of the sample to be analyzed.

The report is structured in multiple sections, described briefly below:

Settings
- Lists key configurations provided by user, including the list of genes that constitute the virtual gene panel in the report
Summary of findings
- Summarizes the main findings in the sample through value boxes
Variant classification
- For all coding variants in the selected cancer predisposition geneset, interactive variant tables are shown for each level of clinical significance (ClinVar and non-ClinVar (Other) variants combined):
  - Pathogenic
  - Likely Pathogenic
  - Variants of Uncertain Significance (VUS)
  - Likely Benign
  - Benign
Genomic biomarkers
- Reported clinical evidence items from CIViC that match with variants in the query set are reported in four distinct tabs (Predictive / Prognostic / Diagnostic / Predisposing)
- Pharmacogenetic findings (DPYD, TPMT, NUDT15)
Secondary findings
- Pathogenic variants in the ACMG-recommended list of genes for report of secondary/incidental findings
GWAS hits
- Status of relatively common, low-risk variants found in genome-wide association studies of cancer phenotypes (NHGRI-EBI Catalog)
Documentation
- Introduction
  - Short overview of the CPSR variant report - aims and contents
- Annotation resources
  - Information on annotation sources utilized by CPSR, including versions and licensing requirements
- Variant classification
  - Overview of how CPSR performs variant annotation and classification of variants not recorded in ClinVar, listing ACMG criteria and associated scores, calibration of classification thresholds etc.
References
- Supporting scientific literature - knowledge resources, guideline references etc.

Variant call format - VCF

A VCF file containing annotated, germline variant calls (single nucleotide variants and insertions/deletions) is generated with the following naming convention:

<sample_id>.cpsr.<genome_assembly>.vcf.gz (.tbi)
- The sample_id is provided as input by the user, and reflects a unique identifier of the sample to be analyzed. Following common standards, the annotated VCF file is compressed with bgzip and indexed with tabix. Below follows a description of all annotations/tags present in the VCF INFO column after processing with the CPSR annotation pipeline:

VEP consequence annotations

Tag	Description
`CSQ`	Complete consequence annotations from VEP. Format (separated by a `\|`): `Allele`, `Consequence`, `IMPACT`, `SYMBOL`, `Gene`, `Feature_type`, `Feature`, `BIOTYPE`, `EXON`, `INTRON`, `HGVSc`, `HGVSp`, `cDNA_position`, `CDS_position`, `Protein_position`, `Amino_acids`, `Codons`, `Existing_variation`, `ALLELE_NUM`, `DISTANCE`, `STRAND`, `FLAGS`, `PICK`, `VARIANT_CLASS`, `SYMBOL_SOURCE`, `HGNC_ID`, `CANONICAL`, `MANE_SELECT`, `MANE_PLUS_CLINICAL`, `TSL`, `APPRIS`, `CCDS`, `ENSP`, `SWISSPROT`, `TREMBL`, `UNIPARC`, `UNIPROT_ISOFORM`, `RefSeq`, `DOMAINS`, `HGVS_OFFSET`, `gnomADe_AF`, `gnomADe_AFR_AF`, `gnomADe_AMR_AF`, `gnomADe_ASJ_AF`, `gnomADe_EAS_AF`, `gnomADe_FIN_AF`, `gnomADe_NFE_AF`, `gnomADe_OTH_AF`, `gnomADe_SAS_AF`, `CLIN_SIG`, `SOMATIC`, `PHENO`, `CHECK_REF`, `MOTIF_NAME`, `MOTIF_POS`, `HIGH_INF_POS`, `MOTIF_SCORE_CHANGE`, `TRANSCRIPTION_FACTORS`, `NearestExonJB`
`Consequence`	Impact modifier for the consequence type (picked by VEP’s `--flag_pick_allele` option)
`Gene`	Ensembl stable ID of affected gene (picked by VEP’s `--flag_pick_allele` option)
`Feature_type`	Type of feature. Currently one of `Transcript`, `RegulatoryFeature`, `MotifFeature` (picked by VEP’s `--flag_pick_allele` option)
`Feature`	Ensembl stable ID of feature (picked by VEP’s `--flag_pick_allele` option)
`cDNA_position`	Relative position of base pair in cDNA sequence (picked by VEP’s `--flag_pick_allele` option)
`CDS_position`	Relative position of base pair in coding sequence (picked by VEP’s `--flag_pick_allele` option)
`CDS_RELATIVE_POSITION`	Ratio of variant coding position to length of coding sequence
`CDS_CHANGE`	Coding, transcript-specific sequence annotation (picked by VEP’s `--flag_pick_allele` option)
`ALTERATION`	HGVSp/HGVSc identifier
`AMINO_ACID_START`	Protein position indicating absolute start of amino acid altered (fetched from `Protein_position`)
`AMINO_ACID_END`	Protein position indicating absolute end of amino acid altered (fetched from `Protein_position`)
`Protein_position`	Relative position of amino acid in protein (picked by VEP’s `--flag_pick_allele` option)
`Amino_acids`	Only given if the variant affects the protein-coding sequence (picked by VEP’s `--flag_pick_allele` option)
`GRANTHAM_DISTANCE`	Grantham distance between the reference and variant amino acids
`Codons`	The alternative codons with the variant base in upper case (picked by VEP’s `--flag_pick_allele` option)
`IMPACT`	Impact modifier for the consequence type (picked by VEP’s `--flag_pick_allele` option)
`VARIANT_CLASS`	Sequence Ontology variant class (picked by VEP’s `--flag_pick_allele` option)
`SYMBOL`	Gene symbol (picked by VEP’s `--flag_pick_allele` option)
`SYMBOL_SOURCE`	The source of the gene symbol (picked by VEP’s `--flag_pick_allele` option)
`STRAND`	The DNA strand (1 or -1) on which the transcript/feature lies (picked by VEP’s `--flag_pick_allele` option)
`ENSP`	The Ensembl protein identifier of the affected transcript (picked by VEP’s `--flag_pick_allele` option)
`FLAGS`	Transcript quality flags: `cds_start_NF`: CDS 5’, incomplete `cds_end_NF`: CDS 3’ incomplete (picked by VEP’s `--flag_pick_allele` option)
`SWISSPROT`	Best match UniProtKB/Swiss-Prot accession of protein product (picked by VEP’s `--flag_pick_allele` option)
`TREMBL`	Best match UniProtKB/TrEMBL accession of protein product (picked by VEP’s `--flag_pick_allele` option)
`UNIPARC`	Best match UniParc accession of protein product (picked by VEP’s `--flag_pick_allele` option)
`UNIPROT_ISOFORM`	Best match UniProtKB isoform accession of protein product (picked by VEP’s `--flag_pick_allele` option)
`HGVSc`	The HGVS coding sequence name (picked by VEP’s `--flag_pick_allele` option)
`HGVSc_RefSeq`	The HGVSc coding sequence name using RefSeq transcript identifiers (MANE select) - picked by VEP’s `--flag_pick_allele` option)
`HGVSp`	The HGVS protein sequence name (picked by VEP’s `--flag_pick_allele` option)
`HGVSp_short`	The HGVS protein sequence name, short version (picked by VEP’s `--flag_pick_allele` option)
`HGVS_OFFSET`	Indicates by how many bases the HGVS notations for this variant have been shifted (picked by VEP’s `--flag_pick_allele` option)
`NearestExonJB`	VEP plugin that finds nearest exon junction for a coding sequence variant. Format: `Ensembl exon identifier+distanceto exon boundary+boundary type(start/end)+exon length`
`MOTIF_NAME`	The source and identifier of a transcription factor binding profile aligned at this position (picked by VEP’s `--flag_pick_allele` option)
`MOTIF_POS`	The relative position of the variation in the aligned TFBP (picked by VEP’s `--flag_pick_allele` option)
`HIGH_INF_POS`	A flag indicating if the variant falls in a high information position of a transcription factor binding profile (TFBP) (picked by VEP’s `--flag_pick_allele` option)
`MOTIF_SCORE_CHANGE`	The difference in motif score of the reference and variant sequences for the TFBP (picked by VEP’s `--flag_pick_allele` option)
`CELL_TYPE`	List of cell types and classifications for regulatory feature (picked by VEP’s `--flag_pick_allele` option)
`CANONICAL`	A flag indicating if the transcript is denoted as the canonical transcript for this gene (picked by VEP’s `--flag_pick_allele` option)
`CCDS`	The CCDS identifier for this transcript, where applicable (picked by VEP’s `--flag_pick_allele` option)
`INTRON`	The intron number (out of total number) (picked by VEP’s `--flag_pick_allele` option)
`INTRON_POSITION`	Relative position of intron variant to nearest exon/intron junction (NearestExonJB VEP plugin)
`EXON_POSITION`	Relative position of exon variant to nearest intron/exon junction (NearestExonJB VEP plugin)
`EXON`	The exon number (out of total number) (picked by VEP’s `--flag_pick_allele` option)
`EXON_AFFECTED`	The exon affected by the variant (picked by VEP’s `--flag_pick_allele` option)
`LAST_EXON`	Logical indicator for last exon of transcript (picked by VEP’s `--flag_pick_allele` option)
`LAST_INTRON`	Logical indicator for last intron of transcript (picked by VEP’s `--flag_pick_allele` option)
`INTRON_POSITION`	Relative position of intron variant to nearest exon/intron junction (NearestExonJB VEP plugin)
`EXON_POSITION`	Relative position of exon variant to nearest intron/exon junction (NearestExonJB VEP plugin)
`DISTANCE`	Shortest distance from variant to transcript (picked by VEP’s `--flag_pick_allele` option)
`BIOTYPE`	Biotype of transcript or regulatory feature (picked by VEP’s `--flag_pick_allele` option)
`TSL`	Transcript support level (picked by VEP’s `--flag_pick_allele` option)>
`PUBMED`	PubMed ID(s) of publications that cite existing variant - VEP
`PHENO`	Indicates if existing variant is associated with a phenotype, disease or trait - VEP
`GENE_PHENO`	Indicates if overlapped gene is associated with a phenotype, disease or trait - VEP
`ALLELE_NUM`	Allele number from input; 0 is reference, 1 is first alternate etc - VEP
`REFSEQ_MATCH`	The RefSeq transcript match status; contains a number of flags indicating whether this RefSeq transcript matches the underlying reference sequence and/or an Ensembl transcript (picked by VEP’s `--flag_pick_allele` option)
`PICK`	Indicates if this block of consequence data was picked by VEP’s `--flag_pick_allele` option
`VEP_ALL_CSQ`	All VEP transcript block consequences (`Consequence:SYMBOL:Feature_type:Feature:BIOTYPE`) - VEP
`EXONIC_STATUS`	Indicates if variant consequence type is ‘exonic’ or ‘nonexonic’. We define ‘exonic’ as any variants with the following consequence types: `stop_gained / stop_lost`, `start_lost`, `frameshift_variant`, `missense_variant`, `splice_donor_variant`, `splice_acceptor_variant`, `inframe_insertion / inframe_deletion`, `synonymous_variant`, `protein_altering`
`CODING_STATUS`	Indicates if primary variant consequence type is ‘coding’ or ‘noncoding’. ‘coding’ variants are here defined as those consequence types with an ‘exonic’ status, with the exception of synonymous variants. All other consequence types are considered ‘noncoding’
`NULL_VARIANT`	Primary variant consequence type is `frameshift` or `stop_gained`
`LOSS_OF_FUNCTION`	Loss-of-function variant - primary variant consequence being either `stop_gained / stop_lost`, `start_lost`, `frameshift_variant`, `splice_donor_variant`, or `splice_acceptor_variant`
`LOF_FILTER`	Loss-of-function filter - exceptions to putative LOF variants - GC to GT at splice donor sites or truncations within the last 5% of coding sequence
`SPLICE_DONOR_RELEVANT`	Logical indicating if variant is located at a particular location near the splice donor site (`+3A/G`, `+4A` or `+5G`)
`BIOMARKER_MATCH`	Variant matches with germline biomarker evidence in CIViC/CGI. Format: `<db_source>\|<db_variant_id>\|<db_evidence_id>:<tumor_site>:<clinical_significance>:<evidence_level>:<evidence_type><germline_somatic>\|<matching_type>`. Multiple evidence items are separated by ‘&’. Example: civic\|174\|EID445:Colon/Rectum:Sensitivity/Response:D:Predictive:Germline&EID446:Colon/Rectum:Sensitivity/Response:D:Predictive:Germline\|by_gene_mut. Matching type can be any of `by_genomic_coord`, `by_hgvsp_principal`, `by_hgvsc_principal`, `by_hgvsp_nonprincipal`, `by_hgvsc_nonprincipal`, `by_codon_principal`, `by_exon_mut_principal`, `by_gene_mut_lof`, `by_gene_mut`
`REGULATORY_ANNOTATION`	Comma-separated list of all variant annotations of `Feature_type`, `RegulatoryFeature`, and `MotifFeature`. Format (separated by a `\|`): `<Consequence>`, `<Feature_type>`, `<Feature>`, `<BIOTYPE>`, `<MOTIF_NAME>`, `<MOTIF_POS>`, `<HIGH_INF_POS>`, `<MOTIF_SCORE_CHANGE>`, `<TRANSCRIPTION_FACTORS>`

Gene information

Tag	Description
`ENTREZGENE`	Entrez gene identifier
`APPRIS`	Principal isoform flags according to the APPRIS principal isoform database
`MANE_SELECT`	Indicating if the transcript is the MANE Select for the gene (picked by VEP’s `--flag_pick_allele_gene` option)
`MANE_PLUS_CLINICAL`	Indicating if the transcript is MANE Plus Clinical, as required for clinical variant reporting (picked by VEP’s `--flag_pick_allele_gene` option)
`UNIPROT_ID`	UniProt identifier
`UNIPROT_ACC`	UniProt accession(s)
`ENSEMBL_GENE_ID`	Ensembl gene identifier for VEP’s picked transcript (ENSGXXXXXXX)
`ENSEMBL_TRANSCRIPT_ID`	Ensembl transcript identifier for VEP’s picked transcript (ENSTXXXXXX)
`ENSEMBL_PROTEIN_ID`	Ensembl corresponding protein identifier for VEP’s picked transcript
`REFSEQ_TRANSCRIPT_ID`	Corresponding RefSeq transcript(s) identifier for VEP’s picked transcript (NM_XXXXX)
`REFSEQ_PROTEIN_ID`	RefSeq protein/peptide identifier for VEP’s picked transcript (NP_XXXXXX)
`MANE_SELECT2`	MANE select transcript identifer: one high-quality representative transcript per protein-coding gene that is well-supported by experimental data and represents the biology of the gene - provided through BioMart
`MANE_PLUS_CLINICAL2`	transcripts chosen to supplement MANE Select when needed for clinical variant reporting - provided through BioMart
`GENCODE_TAG`	tag for GENCODE transcript (basic etc)
`GENCODE_TRANSCRIPT_TYPE`	type of transcript (protein-coding etc.)
`TSG`	Indicates whether gene is predicted as a tumor suppressor gene, from Network of Cancer Genes (NCG) & the CancerMine text-mining resource
`TSG_SUPPORT`	Underlying evidence for gene being a tumor suppressor. Format: `CGC_TIER<1/2>&NCG&CancerMine:num_citations"`
`ONCOGENE`	Indicates whether gene is predicted as an oncogene, from Network of Cancer Genes (NCG) & the CancerMine text-mining resource
`ONCOGENE_SUPPORT`	Underlying evidence for gene being an oncogene. Format: `CGC_TIER<1/2>&NCG&CancerMine:num_citations"`
`CPG_SOURCE`	Cancer predisposition gene source (panel 0: `TCGA`, `CGC`, `PANEL_APP`, `OTHER`)
`CGC_GERMLINE`	Member of Cancer Gene Census - germline set
`CGC_SOMATIC`	Member of Cancer Gene Census - somatic set
`CGC_TIER`	Cancer Gene Census tier (1/2)
`NCG_DRIVER`	Cancer driver gene prediction by Network of Cancer Genes (NCG)
`INTOGEN_DRIVER`	Indicates whether gene is predicted as cancer driver from IntOGen’s cancer driver prediction algorithm
`PROB_EXAC_LOF_INTOLERANT`	`dbNSFP_gene`: the probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants) based on ExAC r0.3 data
`PROB_EXAC_LOF_INTOLERANT_HOM`	`dbNSFP_gene`: the probability of being intolerant of homozygous, but not heterozygous lof variants based on ExAC r0.3 data
`PROB_EXAC_LOF_TOLERANT_NULL`	`dbNSFP_gene`: the probability of being tolerant of both heterozygous and homozygous lof variants based on ExAC r0.3 data
`PROB_EXAC_NONTCGA_LOF_INTOLERANT`	`dbNSFP_gene`: the probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants) based on ExAC r0.3 nonTCGA subset
`PROB_EXAC_NONTCGA_LOF_INTOLERANT_HOM`	`dbNSFP_gene`: the probability of being intolerant of homozygous, but not heterozygous lof variants based on ExAC r0.3 nonTCGA subset
`PROB_EXAC_NONTCGA_LOF_TOLERANT_NULL`	`dbNSFP_gene`: the probability of being tolerant of both heterozygous and homozygous lof variants based on ExAC r0.3 nonTCGA subset
`PROB_GNOMAD_LOF_INTOLERANT`	`dbNSFP_gene`: the probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants based on gnomAD 2.1 data
`PROB_GNOMAD_LOF_INTOLERANT_HOM`	`dbNSFP_gene`: the probability of being intolerant of homozygous, but not heterozygous lof variants based on gnomAD 2.1 data
`PROB_GNOMAD_LOF_TOLERANT_NULL`	`dbNSFP_gene`: the probability of being tolerant of both heterozygous and homozygous lof variants based on gnomAD 2.1 data
`PROB_HAPLOINSUFFICIENCY`	`dbNSFP_gene`: Estimated probability of haploinsufficiency of the gene (from http://dx.doi.org/10.1371/journal.pgen.1001154)
`ESSENTIAL_GENE_CRISPR`	`dbNSFP_gene`: Essential (`E`) or Non-essential phenotype-changing (`N`) based on large scale CRISPR experiments (from http://dx.doi.org/10.1126/science.aac7041)
`ESSENTIAL_GENE_CRISPR2`	`dbNSFP_gene`: Essential (`E`), context-Specific essential (`S`), or Non-essential phenotype-changing (`N`) based on large scale CRISPR experiments (from http://dx.doi.org/10.1016/j.cell.2015.11.015)

Variant effect and protein-coding information

Tag	Description
`MUTATION_HOTSPOT`	mutation hotspot codon in cancerhotspots.org. Format: `GeneSymbol\|Entrez_ID\|CodonRefAA\|Alt_AA\|Q-value`
`MUTATION_HOTSPOT_MATCH`	Type of hotspot match (by_hgvsp_principal, by_hgvsc_principal, by_hgvsp_nonprincipal, by_hgvsc_nonprincipal, by_codon_principal, by_codon_nonprincipal)
`MUTATION_HOTSPOT_CANCERTYPE`	hotspot-associated cancer types (from cancerhotspots.org)
`PFAM_DOMAIN`	Pfam domain identifier (from VEP)
`SPLICE_EFFECT`	Effect of splicing, from MutSpliceDB and/or MaxEntScan. Format:
MES
`EFFECT_PREDICTIONS`	Insilico predictions variant effect on protein function and pre-mRNA splicing from database of non-synonymous functional predictions - dbNSFP v5.0. Predicted effects are provided by different sources/algorithms (separated by `&`), `T` = Tolerated, `N` = Neutral, `D` = Damaging
`DBNSFP_BAYESDEL_ADDAF`	predicted effect from BayesDel (dbNSFP)
`DBNSFP_LIST_S2`	predicted effect from LIST-S2 (dbNSFP)
`DBNSFP_SIFT`	predicted effect from SIFT (dbNSFP)
`DBNSFP_POLYPHEN2_HVAR`	predicted effect from PolyPhen2 (dbNSFP)
`DBNSFP_PROVEAN`	predicted effect from PROVEAN (dbNSFP)
`DBNSFP_MUTATIONTASTER`	predicted effect from MUTATIONTASTER (dbNSFP)
`DBNSFP_MUTATIONASSESSOR`	predicted effect from MUTATIONASSESSOR (dbNSFP)
`DBNSFP_M_CAP`	predicted effect from M-CAP (dbNSFP)
`DBNSFP_ALOFT`	predicted effect from ALoFT (dbNSFP)
`DBNSFP_MUTPRED`	score from MUTPRED (dbNSFP)
`DBNSFP_CLINPRED`	predicted effect from ClinPred (dbNSFP)
`DBNSFP_FATHMM`	predicted effect from FATHMM-XF (dbNSFP)
`DBNSFP_PRIMATEAI`	predicted effect from PRIMATEAI (dbNSFP)
`DBNSFP_DEOGEN2`	predicted effect from DEOGEN2 (dbNSFP)
`DBNSFP_PHACTBOOST`	predicted effect from PHACTboost (dbNSFP)
`DBNSFP_ALPHA_MISSENSE`	predicted effect from AlphaMissense (dbNSFP)
`DBNSFP_MUTFORMER`	predicted effect from MutFormer (dbNSFP)
`DBNSFP_ESM1B`	predicted effect from ESM1b (dbNSFP)
`DBNSFP_GERP`	evolutionary constraint measure from GERP (dbNSFP)
`DBNSFP_CADD`	Combined Annotation Dependent Depletion (CADD) score (dbNSFP)
`DBNSFP_VEST4`	VEST4 score (dbNSFP)
`DBNSFP_FATHMM_XF`	predicted effect from FATHMM-XF (dbNSFP)
`DBNSFP_META_RNN`	predicted effect from ensemble prediction (deep learning - dbNSFP)
`DBNSFP_SPLICE_SITE_RF`	predicted effect of splice site disruption, using random forest (dbscSNV)
`DBNSFP_SPLICE_SITE_ADA`	predicted effect of splice site disruption, using boosting (dbscSNV)

Variant allele frequencies/annotations in germline databases

Tag	Description
`gnomADe_AFR_AF`	African/American germline allele frequency (gnomAD release 4.1)
`gnomADe_AMR_AF`	American germline allele frequency (gnomAD release 4.1)
`gnomADe_AF`	Adjusted global germline allele frequency (gnomAD release 4.1)
`gnomADe_SAS_AF`	South Asian germline allele frequency (gnomAD release 4.1)
`gnomADe_EAS_AF`	East Asian germline allele frequency (gnomAD release 4.1)
`gnomADe_FIN_AF`	Finnish germline allele frequency (gnomAD release 4.1)
`gnomADe_NFE_AF`	Non-Finnish European germline allele frequency (gnomAD release 4.1)
`gnomADe_OTH_AF`	Other germline allele frequency (gnomAD release 4.1)
`gnomADe_ASJ_AF`	Ashkenazi Jewish allele frequency (gnomAD release 4.1)
`gnomADe_non_cancer_ASJ_AF`	Alternate allele frequency for samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_EAS_AF`	Alternate allele frequency for samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_AFR_AF`	Alternate allele frequency for samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_AMR_AF`	Alternate allele frequency for samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_OTH_AF`	Alternate allele frequency for samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_NFE_AF`	Alternate allele frequency for samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_FIN_AF`	Alternate allele frequency for samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_SAS_AF`	Alternate allele frequency for samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_AF`	Alternate allele frequency in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_ASJ_AC`	Alternate allele count for samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_EAS_AC`	Alternate allele count for samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_AFR_AC`	Alternate allele count for samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_AMR_AC`	Alternate allele count for samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_OTH_AC`	Alternate allele count for samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_NFE_AC`	Alternate allele frequency for samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_FIN_AC`	Alternate allele count for samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_SAS_AC`	Alternate allele count for samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_AC`	Alternate allele count in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_ASJ_AN`	Total number of alleles in samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_EAS_AN`	Total number of alleles in samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_AFR_AN`	Total number of alleles in samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_AMR_AN`	Total number of alleles in samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_OTH_AN`	Total number of alleles in samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_NFE_AN`	Total number of alleles in samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_FIN_AN`	Total number of alleles in samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_SAS_AN`	Total number of alleles in samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_AN`	Total number of alleles in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_ASJ_NHOMALT`	Count of homozygous individuals in samples of Ashkenazi Jewish ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_EAS_NHOMALT`	Count of homozygous individuals in samples of East Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_AFR_NHOMALT`	Count of homozygous individuals in samples of African-American/African ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_AMR_NHOMALT`	Count of homozygous individuals in samples of Latino ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_OTH_NHOMALT`	Count of homozygous individuals in samples of Other ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_NFE_NHOMALT`	Count of homozygous individuals in samples of Non-Finnish European ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_FIN_NHOMALT`	Count of homozygous individuals in samples of Finnish ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_SAS_NHOMALT`	Count of homozygous individuals in samples of South Asian ancestry in the non-cancer subset (gnomAD 2.1.1)
`gnomADe_non_cancer_NHOMALT`	Count of homozygous individuals in samples in the non-cancer subset (gnomAD 2.1.1)
`DBSNP_RSID`	dbSNP reference ID, as provided by VEP

Clinical associations

Tag	Description
`CLINVAR_MSID`	ClinVar Measure Set/Variant ID
`CLINVAR_ALLELE_ID`	ClinVar allele ID
`CLINVAR_PMID`	Associated Pubmed IDs for variant in ClinVar - germline state-of-origin
`CLINVAR_HGVSP`	Protein variant expression using HGVS nomenclature - ClinVar
`CLINVAR_PMID_SOMATIC`	Associated Pubmed IDs for variant in ClinVar - somatic state-of-origin
`CLINVAR_CONFLICTED`	ClinVar variant has conflicting interpretations
`CLINVAR_CLNSIG`	Clinical significance for variant in ClinVar - germline state-of-origin
`CLINVAR_CLASSIFICATION`	Clean clinical significance on a five-level scheme - ClinVar
`CLINVAR_CLNSIG_SOMATIC`	Clinical significance for variant in ClinVar - somatic state-of-origin
`CLINVAR_MEDGEN_CUI`	Associated MedGen concept identifiers (CUIs) - germline state-of-origin
`CLINVAR_MEDGEN_CUI_SOMATIC`	Associated MedGen concept identifiers (CUIs) - somatic state-of-origin
`CLINVAR_MOLECULAR_EFFECT`	Variant effect according to ClinVar annotation
`CLINVAR_VARIANT_ORIGIN`	Origin of variant (somatic, germline, de novo etc.) for variant in ClinVar
`CLINVAR_REVIEW_STATUS_STARS`	Rating of the ClinVar variant (0-4 stars) with respect to level of review
`GWAS_HIT`	variant associated with cancer phenotype from genome-wide association study (NHGRI-EBI GWAS catalog)

Variant/genotype information

Tag	Description
`GENOTYPE`	Variant genotype (het/hom_ref/hom_alt)
`DP_CONTROL`	Sequencing depth at variant site (‘DP’)

Excel workbook - XLSX

We provide an Excel workbook with four sheets that lists main findings and annotations of the predisposition analysis. The file has the following naming convention:

<sample_id>.cpsr.<genome_assembly>.xlsx

The Excel workbook is populated with the following sheets (the last three sheets will only be present if any data is found):

VIRTUAL_PANEL - details on the the chosen virtual gene panel
CLASSIFICATION - variant classifications and corresponding gene annotations
BIOMARKER_EVIDENCE - matches of variants with genomic biomarkers
SECONDARY_FINDINGS - secondary findings (ACMG recommendations)
PHARMACOGENETIC_FINDINGS - drug toxicity findings (DPYD, TPMT, NUDT15)

Tab-separated values - TSV

Variant classification

We provide a compressed tab-separated values file with variant classifications and the most essential variant/gene annotations. The file has the following naming convention:

<sample_id>.cpsr.<genome_assembly>.classification.tsv.gz

The SNVs/InDels are classified according to clinical significance (pathogenicity) (as defined above for the HTML report).

The following variables are included in the tiered TSV file (VCF tags in the query VCF potentially retained by the user will be appended):

Variable	Description
1. `SAMPLE_ID`	Sample identifier
2. `GENOMIC_CHANGE`	Identifier for variant at the genome (VCF) level, e.g. `1:g.152382569A>G`. Format: `<chrom>:g.<position><ref_allele>><alt_allele>`
3. `VAR_ID`	Variant identifier - chrom_pos_ref_alt
4. `GENOME_VERSION`	Assembly version, e.g. grch37/grch38
5. `GENOTYPE`	Variant genotype (het/hom_ref/hom_alt)
6. `DP_CONTROL`	Sequencing depth at variant site (‘DP’)
7. `CPSR_CLASSIFICATION_SOURCE`	ClinVar or CPSR_ACMG (the latter meaning variant not recorded in ClinVar, classified by CPSR)
8. `VARIANT_CLASS`	Variant type, e.g. SNV/insertion/deletion
9. `CODING_STATUS`	coding/noncoding (wrt. protein alteration and canonical splice site disruption)
10. `SYMBOL`	Gene symbol
11. `GENENAME`	Gene description
12. `CCDS`	CCDS identifier
13. `ENTREZGENE`	Entrez gene identifier
14. `UNIPROT_ID`	UniProt protein identifier
15. `ENSEMBL_GENE_ID`	Ensembl gene identifier
16. `ENSEMBL_TRANSCRIPT_ID`	Ensembl transcript identifier
17. `REFSEQ_TRANSCRIPT_ID`	RefSeq mRNA identifier
18. `ONCOGENE`	Gene is predicted as an oncogene according to Network of Cancer Genes (NCG)/Cancer Gene Census (CGC) and CancerMine
19. `TUMOR_SUPPRESSOR`	Gene is predicted as a tumor suppressor gene according to Network of Cancer Genes (NCG)/Cancer Gene Census (CGC) and CancerMine
20. `CPG_MOD`	Gene - cancer predisposition mechanism of disease (e.g. LoF)
21. `CPG_MOI`	Gene - cancer predisposition mode of inheritance
22. `CONSEQUENCE`	Variant consequence
23. `ALTERATION`	Molecular alteration (HGVSp or HGVSc pending on consequence)
24. `PROTEIN_CHANGE`	Protein change - one letter abbreviation (HGVSp)
25. `PFAM_DOMAIN`	Protein domain (Pfam identifier)
26. `PFAM_DOMAIN_NAME`	Protein domain name (Pfam)
27. `HGVSp`	The HGVS protein sequence name
28. `GRANTHAM_DISTANCE`	Grantham distance for amino acid change (Grantham score)
29. `HGVSc`	The HGVS coding sequence name
30. `HGVSc_RefSeq`	The HGVS coding sequence name (RefSeq - MANE Select)
31. `CDS_CHANGE`	Coding, transcript-specific sequence annotation
32. `LAST_EXON`	Last exon in gene
33. `EXON`	Exon of variant/total number of exons in transcript (from VEP)
34. `EXON_AFFECTED`	Transcript exon of variant (from VEP)
35. `EXON_POSITION`	Relative position of exon variant to nearest intron/exon junction (NearestExonJB plugin)
36. `INTRON_POSITION`	Relative position of intron variant to nearest intron/exon junction (NearestExonJB plugin)
37. `VEP_ALL_CSQ`	All VEP transcript block consequences
38. `CANCER_PHENOTYPE`	For variants with a ClinVar classification, indication of cancer-associated disease/phenotype (1) or not (0)
39. `MUTATION_HOTSPOT`	Cancer mutation hotspot (cancerhotspots.org)
40. `RMSK_HIT`	RepeatMasker hit
41. `EFFECT_PREDICTIONS`	Functional effect predictions from multiple algorithms (dbNSFP)
42. `SPLICE_EFFECT`	Splice effect annotations from MutSpliceDB and MaxEntScan (see details above)
43. `LOSS_OF_FUNCTION`	Loss-of-function variant
44. `LOF_FILTER`	Loss-of-function filter
45. `NULL_VARIANT`	Frameshift or stop-gain variant
46. `DBMTS`	variant with potential effect on microRNA target sites (dbMTS). Format: `<ensembl_transcript_id>\|<microrna_identifier>\|<target_prediction_algorithms>\|<gain_loss_consensus>`. Target prediction algorithms indicate support by different algorithms (separated by ‘&’), `TS` = TargetScan, `M` = miRanda, `R` = RNAhybrid. Gain_loss_consensus indicate whether the variant was predicted to disrupt a binding site (`L` = Loss), or create a new target site (`G` = gain) by the different algorithms
47. `REGULATORY_ANNOTATION`	Overlap of variant with regulatory elements (VEP)
48. `TF_BINDING_SITE_VARIANT`	Indicates whether a variant overlaps a critical/non-critical position of a transcription factor binding site (TFBS) - as provided by VEP’s–regulatory option (‘Overlap: non-critical motif position’ or ‘Overlap: critical motif position’)
49. `TF_BINDING_SITE_VARIANT_INFO`	Comma-separated list of transcription factor binding sites affected by variant. Format per factor: `<TRANSCRIPTION_FACTOR>\|<MOTIF_NAME>\|<MOTIF_POS>\|<MOTIF_SCORE_CHANGE>\|<HIGH_INF_POS>`. HIGH_INF_POS indicates whether the variant overlapped a critical motif position (`Y`), or non-critical motif position (`N`)
50. `GERP_SCORE`	Genomic conservation score (GERP)
51. `DBSNP_RSID`	dbSNP identifier (rsid)
52. `CLINVAR_CLASSIFICATION`	clinical significance of ClinVar-recorded variant
53. `CLINVAR_MSID`	Measureset identifier of ClinVar variant
54. `CLINVAR_VARIANT_ORIGIN`	Variant origin (somatic/germline) of ClinVar variant
55. `CLINVAR_CONFLICTED`	Indicator of conflicting interpretations
56. `CLINVAR_PHENOTYPE`	Associated phenotype(s) for ClinVar variant
57. `CLINVAR_REVIEW_STATUS_STARS`	Review confidence - number of gold stars
58. `N_INSILICO_CALLED`	Number of algorithms with effect prediction (damaging/tolerated) from dbNSFP
59. `N_INSILICO_DAMAGING`	Number of algorithms with damaging prediction from dbNSFP
60. `N_INSILICO_TOLERATED`	Number of algorithms with tolerated prediction from dbNSFP
61. `N_INSILICO_SPLICING_NEUTRAL`	Number of algorithms with splicing neutral prediction from dbscSNV
62. `N_INSILICO_SPLICING_AFFECTED`	Number of algorithms with splicing affected prediction from dbscSNV
63. `gnomADe_AF`	Global MAF in gnomAD (exome samples)
64. `FINAL_CLASSIFICATION`	Final variant classification, using either `CLINVAR_CLASSIFICATION` if variant is ClinVar-classified, or `CPSR_CLASSIFICATION` for novel variants. Note: ClinVar-classified
variants annotated with Drug Response or Risk Factor clinical significance are provided a `VUS` final classification
65. `CPSR_CLASSIFICATION`	Variant clinical significance by CPSR’s classification algorithm (P/LP/VUS/LB/B)
66. `CPSR_PATHOGENICITY_SCORE`	Aggregated pathogenicity score by CPSR’s algorithm
67. `CPSR_CLASSIFICATION_CODE`	Combination of CPSR classification codes assigned to the variant (ACMG)
68. `<CUSTOM_POPULATION_GNOMAD`>	Population specific MAF in gnomAD control (non-cancer, population configured by user)

NOTE: The user has the possibility to append the TSV file with data from other INFO tags in the input VCF (i.e. using the –retained_info_tags option)

Biomarker evidence

We provide a compressed tab-separated values file with variants implicated as germline biomarkers. The file has the following naming convention:

<sample_id>.cpsr.<genome_assembly>.biomarker_evidence.tsv.gz

Pharmacogenetic findings

We provide a compressed tab-separated values file with variants implicated with drug toxicity/dosage effects of cancer chemotherapies. The file has the following naming convention:

<sample_id>.cpsr.<genome_assembly>.pgx_findings.tsv.gz

Biomarker annotations

The TSV biomarker evidence output, the interactive HTML report (section Genomic biomarkers), and the Excel workbook (sheet BIOMARKER_EVIDENCE), contains information on matches between potential pathogenic/likely pathogenic sample variants and reported biomarkers, the latter referring to clinical evidence items that relate genomic genomic aberrations to prognosis, diagnosis or sensitivity/resistance to particular treatments. All biomarker annotations are prefixed with BM_, and the following is provided per evidence item:

Variable	Description
1. `BM_CANCER_TYPE`	Annotated cancer type for biomarker - from CIViC
2. `BM_DISEASE_ONTOLOGY_ID`	Disease ontology id for cancer type - from CIViC
3. `BM_PRIMARY_SITE`	Primary tumor type of cancer type - mapped with phenOncoX
4. `BM_CLINICAL_SIGNIFICANCE`	Clinical significance of biomarker (drug sensitivity, drug resistance, poor outcome etc.) - from CIViC
5. `BM_THERAPEUTIC_CONTEXT`	Cancer drugs associated with biomarker (for biomarkers related to drug sensitivity/resistance) - from CIViC
6. `BM_CITATION`	Reference/source for biomarker - i.e. publication or guidelines - from CIViC
7. `BM_RATING`	Rating of biomarker - from CIViC
8. `BM_MOLECULAR_PROFILE_NAME`	Associated name of molecular profile - i.e. “BRCA mutation” - from CIViC
9. `BM_EVIDENCE_TYPE`	Biomarker type - Predictive, Diagnostic, Prognostic, Predisposing - from CIViC
10. `BM_EVIDENCE_LEVEL`	Strength of evidence for the given biomarker - A to D - from CIViC
11. `BM_EVIDENCE_DIRECTION`	Direction of biomarker evidence, i.e. Supports or Does Not Support - from CIViC
12. `BM_EVIDENCE_DESCRIPTION`	Description of biomarker - from CIViC
13. `BM_SOURCE_DB`	Biomarker source database - CIViC
14. `BM_EVIDENCE_ID`	Evidence identifier - from CIViC
15. `BM_VARIANT_ORIGIN`	Origin of biomarker variant - germline
16. `BM_MATCH`	Match between sample variant and biomarker - by_genomic_coord, by_hgvsp_principal, by_gene_mut_lof etc.
17. `BM_RESOLUTION`	Highest resolution of mapping between sample variant and biomarker - genomic, hgvsp, codon, gene