The CPSR software comes with a range of options to configure the report and analysis.
Key settings
Below, we will outline some key settings that are important for the generation of variant reports with CPSR.
Virtual gene panel
The user can flexibly choose the set of cancer predisposition genes for which input variants should be subject to classification and reporting. The input VCF with germline variants may thus come from any sequencing assay (whole-genome, whole-exome or targeted assay), yet the results shown in the report will always be restricted by the choice of a virtual gene panel:
--panel_id <panel_number>
The user can choose from a range of pre-defined gene panels, selected from the following list of panel identifiers:
- 0: Exploratory panel - all cancer predisposition gene
- 1: Adult solid tumours cancer susceptibility
- 2: Adult solid tumours for rare disease
- 3: Bladder cancer pertinent cancer susceptibility
- 4: Brain cancer pertinent cancer susceptibility
- 5: Breast cancer pertinent cancer susceptibility
- 6: Childhood solid tumours cancer susceptibility
- 7: Colorectal cancer pertinent cancer susceptibility
- 8: Endometrial cancer pertinent cancer susceptibility
- 9: Familial Tumours Syndromes of the central & peripheral Nervous system
- 10: Familial breast cancer
- 11: Familial melanoma
- 12: Familial prostate cancer
- 13: Familial rhabdomyosarcoma
- 14: GI tract tumours
- 15: Genodermatoses with malignancies
- 16: Haematological malignancies cancer susceptibility
- 17: Haematological malignancies for rare disease
- 18: Head and neck cancer pertinent cancer susceptibility
- 19: Inherited MMR deficiency (Lynch syndrome)
- 20: Inherited non-medullary thyroid cancer
- 21: Inherited ovarian cancer (without breast cancer)
- 22: Inherited pancreatic cancer
- 23: Inherited polyposis
- 24: Inherited predisposition to acute myeloid leukaemia (AML)
- 25: Inherited susceptibility to acute lymphoblastoid leukaemia (ALL)
- 26: Inherited predisposition to GIST
- 27: Inherited renal cancer
- 28: Inherited phaeochromocytoma and paraganglioma
- 29: Melanoma pertinent cancer susceptibility
- 30: Multiple endocrine tumours
- 31: Multiple monogenic benign skin tumours
- 32: Neuroendocrine cancer pertinent cancer susceptibility
- 33: Neurofibromatosis Type 1
- 34: Ovarian cancer pertinent cancer susceptibility
- 35: Parathyroid Cancer
- 36: Prostate cancer pertinent cancer susceptibility
- 37: Renal cancer pertinent cancer susceptibility
- 38: Rhabdoid tumour predisposition
- 39: Sarcoma cancer susceptibility
- 40: Sarcoma susceptibility
- 41: Thyroid cancer pertinent cancer susceptibility
- 42: Tumour predisposition - childhood onset
- 43: Upper gastrointestinal cancer pertinent cancer susceptibility
- 44: DNA repair genes pertinent cancer susceptibility
Custom-made virtual gene panels
CPSR allows users to create custom virtual gene panels for reporting. Any set of genes found in the CPSR superpanel (panel 0) can be used to design a custom virtual gene panel. Technically, the users need to create a simple one-column text file with Ensembl gene identifiers, and provide a name for the custom panel:
--custom_list <custom_list_tsv>
--custom_list_name <custom_list_name
ACMG/AMP variant classification
The ACMG/AMP classification criteria rely heavily upon variant population frequencies from gnomAD. The user can specify the preferred population when it comes to variant allele frequencies used by CPSR:
--pop_gnomad <pop_code>
There is also an option to set an upper minor allele frequency limit (gnomAD - global population) for variants to be included in the report. This is basically a means to exclude the potentially large number of common and benign variants that can be found in the input VCF:
--maf_upper_threshold <maf_threshold>
By default, CPSR provides two types of variant classifications:
- Existing variant classifications - as provided by ClinVar
- CPSR-derived classifications of novel variants (not found in ClinVar)
The user may opt to show CPSR’s classification of the variants which are already attributed with classifications in ClinVar (i.e. set 1.):
--classify_all
In the final classification, note that CPSR’s classification will not overrule the classification reported by ClinVar, which takes precedence for variants with a record here.
By default, CPSR do not report variants in the input sample that are found in cancer predisposition genes, yet not associated with diseases or syndromes that are cancer relevant. The user may choose to show all reported ClinVar variants in cancer predisposition genes through a dedicated option:
--clinvar_report_noncancer
Optional report contents
CPSR allows users to report recommended incidental findings, and also the genotypes of reported cancer risk loci from genome-wide association studies (GWAS):
--secondary_findings
--gwas_findings
All options
A cancer predisposition report is generated by running the cpsr command, which takes the following arguments and options:
usage: cpsr -h [options]
--input_vcf <INPUT_VCF>
--vep_dir <VEP_DIR>
--refdata_dir <REFDATA_DIR>
--output_dir <OUTPUT_DIR>
--genome_assembly <GENOME_ASSEMBLY>
--sample_id <SAMPLE_ID>
Cancer Predisposition Sequencing Reporter - report of clinically significant cancer-predisposing germline variants
Required arguments:
--input_vcf INPUT_VCF
VCF input file with germline query variants (SNVs/InDels).
--vep_dir VEP_DIR Directory of VEP cache, e.g. $HOME/.vep
--refdata_dir REFDATA_DIR
Directory that contains the PCGR/CPSR reference data, e.g. ~/pcgr-data-1.4.1.9019
--output_dir OUTPUT_DIR
Output directory
--genome_assembly {grch37,grch38}
Genome assembly build: grch37 or grch38
--sample_id SAMPLE_ID
Sample identifier - prefix for output files
Panel options:
--panel_id VIRTUAL_PANEL_ID
Comma-separated string with identifier(s) of predefined virtual cancer predisposition gene panels,
choose any combination of the following identifiers (GEP = Genomics England PanelApp):
0 = CPSR exploratory cancer predisposition panel (PanelApp genes / TCGA's germline study / Cancer Gene Census / Other)
1 = Adult solid tumours cancer susceptibility (GEP)
2 = Adult solid tumours for rare disease (GEP)
3 = Bladder cancer pertinent cancer susceptibility (GEP)
4 = Brain cancer pertinent cancer susceptibility (GEP)
5 = Breast cancer pertinent cancer susceptibility (GEP)
6 = Childhood solid tumours cancer susceptibility (GEP)
7 = Colorectal cancer pertinent cancer susceptibility (GEP)
8 = Endometrial cancer pertinent cancer susceptibility (GEP)
9 = Familial Tumours Syndromes of the central & peripheral Nervous system (GEP)
10 = Familial breast cancer (GEP)
11 = Familial melanoma (GEP)
12 = Familial prostate cancer (GEP)
13 = Familial rhabdomyosarcoma (GEP)
14 = GI tract tumours (GEP)
15 = Genodermatoses with malignancies (GEP)
16 = Haematological malignancies cancer susceptibility (GEP)
17 = Haematological malignancies for rare disease (GEP)
18 = Head and neck cancer pertinent cancer susceptibility (GEP)
19 = Inherited MMR deficiency (Lynch Syndrome) - GEP
20 = Inherited non-medullary thyroid cancer (GEP)
21 = Inherited ovarian cancer (without breast cancer) (GEP)
22 = Inherited pancreatic cancer (GEP)
23 = Inherited polyposis and early onset colorectal cancer (GEP)
24 = Inherited predisposition to acute myeloid leukaemia (AML) (GEP)
25 = Inherited susceptibility to acute lymphoblastoid leukaemia (ALL) (GEP)
26 = Inherited predisposition to GIST (GEP)
27 = Inherited renal cancer (GEP)
28 = Inherited phaeochromocytoma and paraganglioma (GEP)
29 = Melanoma pertinent cancer susceptibility (GEP)
30 = Multiple endocrine tumours (GEP)
31 = Multiple monogenic benign skin tumours (GEP)
32 = Neuroendocrine cancer pertinent cancer susceptibility (GEP)
33 = Neurofibromatosis Type 1 (GEP)
34 = Ovarian cancer pertinent cancer susceptibility (GEP)
35 = Parathyroid Cancer (GEP)
36 = Prostate cancer pertinent cancer susceptibility (GEP)
37 = Renal cancer pertinent cancer susceptibility (GEP)
38 = Rhabdoid tumour predisposition (GEP)
39 = Sarcoma cancer susceptibility (GEP)
40 = Sarcoma susceptibility (GEP)
41 = Thyroid cancer pertinent cancer susceptibility (GEP)
42 = Tumour predisposition - childhood onset (GEP)
43 = Upper gastrointestinal cancer pertinent cancer susceptibility (GEP)
44 = DNA repair genes pertinent cancer susceptibility (GEP)
--custom_list CUSTOM_LIST
Provide custom list of genes from virtual panel 0 (single-column .txt/.tsv file with Ensembl gene identifiers),
alternative to predefined panels provided with --panel_id)
--custom_list_name CUSTOM_LIST_NAME
Set name for custom made panel/list (single word - no whitespace), will be displayed in the report
--diagnostic_grade_only
For panel_id's 1-44 (Genomics England PanelApp) - consider genes with a GREEN status only, default: False
Variant classification options:
--secondary_findings Include variants found in ACMG-recommended list for secondary findings (v3.2), default: False
--gwas_findings Report overlap with low to moderate cancer risk variants (tag SNPs) identified from genome-wide association studies, default: False
--pop_gnomad {afr,amr,eas,sas,asj,nfe,fin,global}
Population source in gnomAD (non-cancer subset) used for variant frequency assessment (ACMG classification), default: nfe
--maf_upper_threshold MAF_UPPER_THRESHOLD
Upper MAF limit (gnomAD global population frequency) for variants to be included in the report, default: 0.9
--classify_all Provide CPSR variant classifications (TIER 1-5) also for variants with existing ClinVar classifications in output TSV, default: False
--clinvar_report_noncancer
Report also ClinVar-classified variants attributed to phenotypes/conditions NOT directly related to tumor development, default: False
VEP options:
--vep_n_forks VEP_N_FORKS
Number of forks (option '--fork' in VEP), default: 4
--vep_buffer_size VEP_BUFFER_SIZE
Variant buffer size (variants read into memory simultaneously, option '--buffer_size' in VEP)
- set lower to reduce memory usage, default: 500
--vep_gencode_basic Consider basic GENCODE transcript set only with Variant Effect Predictor (VEP) (option '--gencode_basic' in VEP).
--vep_pick_order VEP_PICK_ORDER
Comma-separated string of ordered transcript properties for primary variant pick
( option '--pick_order' in VEP), default: mane_select,mane_plus_clinical,canonical,biotype,ccds,rank,tsl,appris,length
--vep_no_intergenic Skip intergenic variants during processing (option '--no_intergenic' in VEP), default: False
vcfanno options:
--vcfanno_n_proc VCFANNO_N_PROC
Number of vcfanno processes (option '-p' in vcfanno), default: 4
Other options:
--force_overwrite By default, the script will fail with an error if any output file already exists.
You can force the overwrite of existing result files by using this flag, default: False
--version show program's version number and exit
--no_reporting Run functional variant annotation on VCF through VEP/vcfanno, omit classification/report generation (STEP 4), default: False
--no_html Do not generate HTML report (default: False)
--retained_info_tags RETAINED_INFO_TAGS
Comma-separated string of VCF INFO tags from query VCF that should be kept in CPSR output TSV
--ignore_noncoding Ignore non-coding (i.e. non protein-altering) variants in report, default: False
--debug Print full commands to log
--pcgrr_conda PCGRR_CONDA
pcgrr conda env name (default: pcgrr)
Example run
The cpsr R package comes with a test VCF file (calls from the GRCh37 human genome assembly) that can be used to test the CPSR workflow. Please note that these are artificial germline calls, not originating from an actual patient.
Report generation with the example VCF, using the Adult solid tumours cancer susceptibility as the virtual gene panel, can be performed through the following command:
$ (base) conda activate pcgr
$ (pcgr)
cpsr \
--input_vcf ~/cpsr-2.0.0/inst/examples/example.vcf.gz \
--vep_dir ~/.vep \
--refdata_dir ~/pcgr_ref_data \
--output_dir ~/cpsr-2.0.0/ \
--genome_assembly grch37 \
--panel_id 1 \
--sample_id example \
--secondary_findings \
--classify_all \
--maf_upper_threshold 0.2 \
--force_overwrite
Note that the example command refers to the PCGR data bundle directory (refdata_dir), which contains the data necessary for both PCGR and CPSR.
This command will produce the following output files in the output folder:
- example.cpsr.grch37.vcf.gz (.tbi) - Bgzipped VCF file with various variant annotations appended by CPSR
- example.cpsr.grch37.pass.vcf.gz (.tbi) - Bgzipped VCF file with various variant annotations appended by CPSR (PASS variants only)
- example.cpsr.grch37.conf.yaml - CPSR configuration file - output from pre-reporting annotation (Python) workflow
- example.cpsr.grch37.pass.tsv.gz - Compressed TSV file (generated with vcf2tsvpy) of VCF content with various annotations appended by CPSR
-
example.cpsr.grch37.xlsx - An Excel workbook that
contains
- i) information on virtual gene panel interrogated for variants
- ii) classification of clinical significance for variants overlapping with cancer predisposition genes
- iii) match of variants with existing biomarkers (if any found)
- iv) secondary findings (if any found)
- example.cpsr.grch37.html - Interactive HTML report with clinically relevant variants in cancer predisposition genes
- example.cpsr.grch37.snvs_indels.classification.tsv.gz - TSV file with key annotations of germline SNVs/InDels classified according to clinical significance