Changelog • PCGR

v2.2.3

Date: 2025-07-18
- update Ensembl VEP v113.4 (see bioconda issue)

v2.2.2

Date: 2025-07-15
- fix bug in Grantham distance score evaluation (issue271)
- add support for heterozygous deletions in CNA interpretation - variant classes are now homdel, hetdel, and gain
- fix missing pagination of some data tables with actionable markers

v2.2.1

Date: 2025-03-23
- fix bug in CPSR for ClinVar variants with non-standard significance levels

v2.2.0

Date: 2025-03-22
Major data updates
- ClinVar (2025-03)
- dbNSFP (v5.0)
- CIViC (2025-03-13)
- GENCODE v47 (VEP v113)
- PanelApp (2025-02)
- UniProt KB (2025-01)
- Cancer Gene Census (v101)
Cleaned and improved oncogenicity classification of variants
- Added Grantham distance for amino acid alterations - taken into account
- only consider LP variant classification when oncogenicity score = 4 and if at least three oncogenic criteria matches or ONCG_OS1 is matching for oncogenicity classification
Multiple cosmetic changes to HTML report - e.g. collapsed call-outs
Added oncogenicity documentation in variant tables of HTML report (indicates which criteria that was matched for a given variant)
Removed VEP_ALL_CSQ from variant tables in HTML report to reduce file size - still available in TSV output
Added MutSpliceDB splice site effects (used e.g. for loss-of-function annotation)
Added MaxEntScan plugin in VEP for splice site disruption prediction (used e.g. for loss-of-function annotation)
Added more prediction algorithms from dbNSFP
Fixed bug that caused crash for missing values in DP/AF values of input VCF
Fixed erroneous re-formatting of MAF integer columns to floats
Fixed cases for which similar Entrez gene identifiers mapped to different gene names

v2.1.2

Date: 2024-10-21
Highlight also VUS variants (ClinVar-classified) in the variant oncogenicity section of the HTML report

v2.1.1

Date: 2024-10-11
Fix bug in parsing of relative CDS position (issue252)
Move some code in the qmd templates to pcgrr functions

v2.1.0

Date: 2024-09-27
Major data updates
- ClinVar (2024-09)
- dbNSFP (v4.8)
- NCI Thesaurus v24.07e
- CIViC (2024-09-18)
Reduction (~15%) in overall data bundle size - removed unused data files (e.g. expression counts)
Fixed bug in MAF output for tumor-only runs issue250, also ensure that non-exonic variants are excluded if setting --exclude_nonexonic is used)
Fixed bug in annotation of splice site mutation hotspots (e.g. MET exon 14 skipping)
Highlighted variants with known pathogenic/likely pathogenic clinical significance in ClinVar (regardless of phenotype and variant origin) in the variant oncogenicity section of the HTML report
Created interactive visualization support for allele-specific copy number data (HTML report)
Slight change to the default transcript consequence pick order in VEP based on observations of prioritized transcripts (mane_select > mane_plus_clinical > canonical > biotype > ccds > rank > tsl > appris >length)
Pulled in known oncogenic variants from ClinVar (assessed through ClinGen/CGC/VICC SOP, oncogenic/likely oncogenic) into the variant oncogenicity assessment algorithm
Added option --no_html to disable HTML report generation
Added option --input_cpsr - re-offering the possibility to integrate CPSR-classified germline variants in the PCGR HTML report
Added HGVSc_RefSeq as output column in TSV/HTML - using MANE Select RefSeq transcript identifiers (works primarily for grch38)
Pulled in coding sequence start annotation for protein-coding transcripts from GENCODE, enabling a more useful annotation of promoter variants (e.g. TERT)
Created new column ALTERATION in variant tables of HTML report, a combination of HGVSp, HGVSc (if HGVSp not available)
New output file for tumor-only runs, the complete set of calls, filtered and unfiltered, in a TSV file
Re-processed all RNA-seq reference cohorts (TCGA, DepMap, TreeHouse), ensuring that all cohorts are using the same unit (log2(TPM+0.001))
Separated outlier gene expression results into separate tabs in the HTML report, added them to Excel workbook output
Added section on kataegis events in the HTML report
Fixed bug in plotting of reference TMB distributions for different TMB algorithms (--tmb_display option)

v2.0.3

Date: 2024-08-01
Ensure correct propagation of purity/ploidy in output report
Ensure that MAF output is properly filtered for tumor-only runs
Ensure properly copying of quarto templates (abandon file.copy), both for PCGR and CPSR

v2.0.2

Date: 2024-07-16
Ensure correct reference to variant actionability guidelines - AMP/ASCO/CAP (not ACMG/AMP), both in code and in docs (thanks to HomoPolyethylen for pointing this out)
fix bug in missing assignment of tier 3 variants (AMP/ASCO/CAP)
ensure non-exonic biomarker variants (e.g. TERT) are written to Excel sheet
specify (value boxes, plots) that MSI classification is based on coding variants

v2.0.1

Date: 2024-07-07
Fixed bug for chrM variants in input - not properly annotated by VEP, and not correctly processed in pcgrr. Any mitochondrial variants found in input VCF are now removed during VCF pre-processing.

v2.0.0

Date: 2024-06-26
Major data updates
- ClinVar (2024-06)
- NCI Thesaurus v24.05d
- Open Targets Platform v2024.06
- CIViC (2024-06-21)
- CGI Cancer Biomarkers database (2022/10/17)
- GENCODE v46/v19 (GRCh38/GRCh37)
- Cancer Gene Census
- CancerMine v50 (2023-03)
- Pfam v35.0 (November 2021)
- Disease Ontology/EFO
- UniProt KB v2024_03
Major software updates
- Ensembl VEP v112
Diff between v2.0.0 and v1.4.1

Added/changed

New report generation framework - quarto
- multiple options related to RMarkdown output are now deprecated
Re-organized data bundle structure
- Users need to download an assembly-specific VEP cache separately from the Ensembl VEP website, and provide its path to the new required argument --vep_dir in the pcgr command
Re-engineered data bundle generation pipeline
Improved data bundle documentation
- An HTML report with an overview of the contents of the data bundle is shipped with the reference data itself, also available here (grch38).
Singularity/Apptainer support
Moved more of the code base to initial Python workflow steps (biomarker matching, CNA segment annotation, RNA expression analysis, oncogenicity classification)
Variants are now classified with respect to both oncogenicity and actionability, and the previous global tier classification (tier 1-5) is thus deprecated
New copy number input format - allele-specific (chrom, start, end, n_major, n_minor)
- New argument n_copy_gain - Minimum number of total copy number for segments considered as gains/amplifications (default: 6)
RNA-bulk expression input permitted in the pcgr command
- --input_rna_expression - accepts a TSV file with gene expression values
- --expression_sim - boolean flag to enable expression similarity analysis
- --expression_sim_db - Comma-separated string of databases for used in RNA expression similarity analysis, default: tcga,depmap,treehouse
TMB calculations can be adjusted using several parameters:
- --tmb_display - Type of TMB measure to show in report (coding_and_silent, coding_non_silent, missense_only)
- --tmb_dp_min - Minimum depth for a position to be considered for TMB calculation (default: 0) - requires allelic support information from VCF
- --tmb_af_min - Minimum allele frequency for a position to be considered for TMB calculation (default: 0) - requires allelic support information from VCF
A multi-sheet Excel workbook output with analysis output is provided, suitable e.g. for aggregation of results across samples
argument name changes to pcgr:
- --pcgr_dir renamed to --refdata_dir
- --clinvar_ignore_noncancer renamed to --clinvar_report_noncancer, meaning that variants found in ClinVar, yet attributed to non-cancer related phenotypes, are now excluded from reporting by default
- --vep_gencode_all renamed to --vep_gencode_basic, meaning that the gene variant annotation is now using all GENCODE transcripts by default, not only the basic set
- --preserved_info_tags renamed to --retained_info_tags
- --basic renamed to --no_reporting
- --target_size_mb renamed to --effective_target_size_mb
LOFTEE plugin in VEP removed as loss-of-function variant classifier (due to low level of maintenance, and outdated dependency requirements). For now, a simplified LoF-annotation is used as a replacement, looking primarily at CSQ types (stop_gained, frameshift_variant, splice_acceptor_variant, splice_donor_variant). Furthermore, frameshift/stop-gain variants that are found within the last 5% of the coding sequence length are deemed non-LOF, as are splice donor variants not disrupting the canonical site (GC>GT). An even more advanced LoF-annotation is planned for a future release.
Biomarkers are matched much more comprehensively than in previous versions, matching at the genomic level, codon, exon, amino acid and gene level (both principal and non-principal transcript matches)

Removed

Options for configuring RMarkdown output, i.e. --report_theme, report_nonfloating_toc
--cpsr_report and --include_trials, which can provide the report with associated pathogenic germline variants (from CPSR) and potential clinical trial oppertunities is currenly on hold for a forthcoming release
--no_vcf_validate - VCF validation is simplified, not relying on vcf-validator anymore
Options to filter tumor-only calls using 1000 Genomes Project database, i.e. --maf_onekg_eur, --maf_onekg_amr, --maf_onekg_eas, --maf_onekg_afr, --maf_onekg_sas, --maf_onekg_global
--cell_line
--logr_gain, and --logr_homdel

v1.4.1

Date: 2023-03-14
Diff between v1.4.1 and v1.4.0

Changes

Skip favicon for Rmarkdown by @sigven in pr210
Use biocondarised vcf2tsvpy by @pdiakumis in pr211

v1.4.0

Date: 2023-03-08
Diff between v1.4.0 and v1.3.0

Changes

Pick trans consequence patch by @sigven in pr206
Use VEP vcf as input to vcf2maf.pl by @sigven in pr207
Better handling of pcgr/pcgrr conda env custom naming by @pdiakumis in pr208
Update report style by @sigven in pr209

v1.3.0

Date: 2023-02-28
Diff between v1.3.0 and v1.2.0

Changes

pcgr_summarise.py: proritize protein-coding BIOTYPE csq (pr201)
cpsr.py: expose --pcgrr_conda option to flexibly activate pcgrr env by a non-default pcgrr name
docs: update input.Rmd, running.Rmd
cpsr_validate_input.py: refactor for efficient custom gene egrep
code reformat via autopep8 for annoutils.py, pcgr_vcfanno.py
GitHub Actions:
- bump docker actions setup-buildx-action (v1–v2), build-push-action (v2–v4)
- use miniforge-variant instead of mamba-version: "*"
- replace ::set-output since deprecated

v1.2.0

Date: 2022-11-11
Diff between v1.2.0 and v1.1.0

Changes

Keep only autosomal, X, Y, M/MT chromosomes
Import bcftools as dependency

v1.1.0

Date: 2022-10-28
Diff between v1.1.0 and v1.0.3

Changes

Remove Docker command wrappers (note: this does not remove the Docker functionality from PCGR; instead it removes the legacy wrappers that were created in the original PCGR version). This along with a lot of other general changes are summarised in pr193. Of note:
- --no_docker and --docker_uid CLI arguments are now obsolete.
- --version CLI argument added for pcgr/cpsr.py
- declutter repetitive log messages
- refactor pcgr/cpsr.py script
Update documentation and declutter logging; refactor dict creation (pr192).
Minor refactor (pr194):
- switch to using Python’s native os.remove and os.rename for glob cleanup
- keep decompressed VCF only if --vcf2maf option is specified. The vcf2maf tool does not support compressed VCFs - see issue235.
Fix for CLI argument --cna_overlap_pct pr196.

New Contributors

@niklasmueboe (pr196).

v1.0.3

Date: 2022-05-24

Fixed

Bug in clinical trials sorting, pr191

v1.0.2

Date: 2022-03-30

Fixed

JSON output for CPSR, issue44

v1.0.1

Date: 2022-03-09

Fixed

Writing to JSON crashes when size of input VCF is huge (variants in the order of millions). If raw input set (VCF) contains > 500,000 variants, this set will, prior to reporting, be reduced by
- 1. exclusion of intergenic and intronic variants, and
- 1. exclusion of upstream_gene/downstream_gene variants (if variant set is still above 500,000 after step A)
Bug in signature analysis (issue187) for cases where the input variant set fits to > 18 different aetiologies.

v1.0.0

Date: 2022-02-25
Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine, KEGG, ChEMBL, Open Targets Platform, Disease Ontology, Experimental Factor Ontology

Added

Command-line options
- VEP options
  - --vep_gencode_all - use all GENCODE transcripts during VEP annotation (not only the basic GENCODE set)
  - --prevalence_reference_signatures - set minimum prevalence (percent) for selection of reference signatures included in refitting procedure for a given tumor type

Changed

Complete restructure of Python and R components.Installation now relies on two separate conda packages, pcgr (Python component) and pcgrr (R component). Direct Docker support remains, with the Dockerfile simplified to rely exclusively on the installation of the above Conda packages.

Removed

VCF validation step. Feedback from users suggested that Ensembl’s vcf-validator was often too stringent so its use has been deprecated. The --no_vcf_validate option remains for backwards compatibility.

v0.9.2

Date: 2021-06-30
Data updates: ClinVar, GWAS catalog, CIViC, CancerMine, dbNSFP, KEGG, ChEMBL, Disease Ontology/EFO, Open Targets Platform, UniProt KB, GENCODE
Software upgrades: R v4.1, Bioconductor v3.13, VEP (104) ++

Changed

TOML-based configuration for PCGR is abandoned, all options to PCGR are now configured through command-line parameters
- NOTE: We recommend to turn on --show_noncoding and --vcf2maf (prevously turned on by default in TOML). For tumor-only runs, we recommend to include --exclude_dbsnp_nonsomatic and exclude_nonexonic

Added

Command-line options
- Previously set in TOML file)
  - Allelic support
    - --tumor_dp_tag
    - --tumor_af_tag
    - --control_dp_tag
    - --control_af_tag
    - --call_conf_tag
  - Tumor-only options
    - --maf_onekg_eur
    - --maf_onekg_amr
    - --maf_onekg_afr
    - --maf_onekg_eas
    - --maf_onekg_sas
    - --maf_onekg_global
    - --maf_gnomad_nfe
    - --maf_gnomad_asj
    - --maf_gnomad_fin
    - --maf_gnomad_oth
    - --maf_gnomad_amr
    - --maf_gnomad_afr
    - --maf_gnomad_eas
    - --maf_gnomad_sas
    - --maf_gnomad_global
    - --exclude_pon
    - --exclude_likely_het_germline
    - --exclude_likely_hom_germline
    - --exclude_dbsnp_nonsomatic
    - --exclude_nonexonic
  - --report_theme
  - --preserved_info_tags (previously custom_tags (TOML))
  - --show_noncoding (previously list_noncoding (TOML))
  - --vcfanno_n_proc (previously n_vcfanno_proc (TOML))
  - --vep_n_forks (previously n_vep_forks (TOML))
  - --vep_pick_order
  - --vep_no_intergenic (previously vep_skip_intergenic (TOML))
  - --vcf2maf
- New options
  - --report_nonfloating_toc (NEW) - add the TOC at the top of the HTML report, not floating at the left of the document
  - --cpsr_report (NEW) - add a dedicated section in PCGR with main germline findings from CPSR analysis - (use the gzipped JSON output from CPSR as input)
  - --vep_regulatory (NEW) - append regulatory annotations to variants (TF binding sites etc.)
  - --include_artefact_signatures (NEW) - include sequencing artefacts in the reference collection of mutational signatures (COSMIC v3.2)

Fixed

Bug in writing (large) report contents to JSON (issue #118)
Bug (typo) in merge of clinical evidence items from different sources (CIVIC + CGI) (issue #126)
Bug in value box for number of (high-confident) kataegis events - rmarkdown (issue #122)
Bug in value box for tumor purity/ploidy -rmarkdown (issue #129)

Removed

Command-line options
- --conf - TOML-based configuration file

v0.9.1

Date: 2020-11-30
Data updates:
- ClinVar,
- GWAS catalog
- CIViC
- CancerMine
- dbNSFP
- KEGG
- ChEMBL/DGIdb
- Disease Ontology, Experimental Factor Ontology

Added

added possibility to configure algorithm for TMB calculation, optional argument tmb_algorithm - all coding variants (all_coding) or non-synonymous variants only (nonsyn)
R code subject to static analysis with lintr
Improved Conda recipe (i.e. meta.yaml) with version pinning of all package dependencies

Changed

Removed DisGeNET annotations from output (associations from Open Targets Platform serve same purpose)
Version pinning of software dependencies in Dockerfile:
- All R packages necessary for PCGR is installed using the renv framework, ensuring improved versioning and reproducibility
- Other tools/utilities and Python libraries that have been version pinned:
  - bedtools, samtools, numpy, cython, scipy, cyvcf2, toml, pandas

v0.9.0rc

Date: 2020-09-24
Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine, UniProt KB, dbNSFP, Pfam, KEGG, Open Targets Platform
Software updates: VEP 101

Fixed

An extra comma was mistakenly present in the template for tier 2 variants, issue #96
Missing protein domain annotations for grch38, issue #116

Changed

All arguments to pcgr.py is now non-positional
Arguments to pcgr.py are divided into two groups: required and optional
Options allelic_support:tumor_dp_min, allelic_support:tumor_af_min, allelic_support:control_dp_min, allelic_support:control_af_max in PCGR configuration file are now optional arguments --tumor_dp_min, --tumor_af_min, --control_dp_min, –control_af_maxincpsr.py`
Option mutational_burden:mutational_burden in PCGR configuration file is now optional argument --estimate_tmb in pcgr.py
Option msi:msi in PCGR configuration file is now optional argument --estimate_msi_status in pcgr.py
Option mutational_signatures:mutational_signatures in PCGR configuration file is now optional argument --estimate_signatures in pcgr.py
Options mutational_signatures:mutsignatures_signature_limit, mutational_signatures:mutsignatures_normalization, mutational_signatures:mutsignatures_mutation_limit, mutational_signatures:mutsignatures_cutoff are removed (used for deconstructSigs analysis, which is no longer in use)
Optional argument --cna_overlap_pct in pcgr.py replaces cna:cna_overlap_pct in PCGR configuration file
Optional argument --logr_gain in pcgr.py replaces cna:logr_gain in PCGR configuration file
Optional argument --logr_homdel in pcgr.py replaces cna:logr_homdel in PCGR configuration file
Removed mutational_burden:tmb_low_limit and mutational_burden:tmb_intermediate_limit - TMB is no longer interpreted in the context of thresholds
Classifications of genes as tumor suppressors/oncogenes are now based on a combination of CancerMine citation count and presence in Network of Cancer Genes
Settings section of report is now divived into three:
- Metadata - sample and sequencing assay
- Report configuration

Added

Optional argument --include_trials in pcgr.py - includes a section with annotated clinical trials for the tumor type in question
Optional argument --assay in pcgr.py - designates type of sequencing assay
Optional argument --cell_line in pcgr.py - designates runs of tumor cell lines (only for display, not used to configure any analysis)
Optional argument --min_mutations_signatures in pcgr.py - minimum number of required mutations for mutational signature analysis with MutationalPatterns
Optional argument --all_reference_signatures in pcgr.py - considers all reference signatures during fitting of mutational profile to known signatures
Optional argument --estimate_signatures now also includes detection of potential kataegis events (WGS/WES assays only), and rainfall plot in the flexdashboard output
The user can now distinguish (through color codes) whether a biomarker has been mapped exactly (nucleotide change) or at a regional level (codon/exon)
All variant-associated biomarkers (regardless of assignment to TIER 1/2) are now found in a new section (SNVs/InDels)
For copy number amplifications, other putative drug targets in cancer are listed in a new section
Detailed documentation of report contents are added to the Documentation section
References are updated and all provided with DOI

v0.8.4

Date: 2019-11-18
Data updates: ClinVar, CIViC, CancerMine, UniProt KB
Software updates: VEP 98.3

v0.8.3

Date: 2019-10-14
Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine
Software updates: VEP 98.2, vcf2tsv

Fixed

More improved mapping between Ensembl transcripts and UniProt accessions (using also RefSeq accessions where available)

Added

Possibility to filter evidence items by RATING in interactive data tables

Changed

Option target_size_mb in pcgr.py replaces target_size_mb in configuration file, more convenient in terms of configuring runs
Option tumor_type in pcgr.py replaces tumor_type in configuration file

v0.8.2

Date: 2019-09-29
Data updates: ClinVar, GWAS catalog, GENCODE, DiseaseOntology, CIViC, CancerMine, UniProt KB
Software updates: VEP 97.3, vcfanno 0.3.2, LOFTEE (VEP plugin) 1.0.3

Fixed

Bug in concatenation of clinical evidence items from different sources (CIVIC + CBMDB) (issues #83,#87)
Silent variants that coincide with biomarkers reported at codon level are ignored
Distinction between clinical evidence items of different origins (somatic + germline)
Improved mapping between Ensembl transcripts and UniProt accessions (using also RefSeq accessions where available)
Bug in UpSetPlot for cases where filtering produce less than two intersecting sets

Added

New field ‘mane’ as criteria for pick order in configuration file (VEP)
Sample identifier to copy number annotation output (convenient for concatenation of output from multiple samples)
Capturing allelic depth (t_depth, t_ref_count etc.) in vcf2maf output (enhancement #52)
Option tumor_only in pcgr.py, replaces vcf_tumor_only in configuration file, more convenient in terms of configuration

v0.8.1

Date: 2019-05-22

Added

Cancer_NOS.toml as configuration file for unspecified tumor types

v0.8.0

Date: 2019-05-20

Fixed

Bug in value box for Tier 2 variants (new line carriage) Issue #73

Added

Upgraded VEP to v96
- Skipping the –regulatory VEP option to avoid forking issues and to improve speed (See this issue)
- Added option to configure pick-order for choice of primary transcript in configuration file
Pre-made configuration files for each tumor type in conf folder
Possibility to append a CNA plot file (.png format) to the section of the report with Somatic CNAs previous feature request
Added possibility to input estimates of tumor purity and ploidy
- shown as value boxes in Main results
Tumor mutational burden is now compared with the distribution of TMB observed for TCGA’s cohorts (organized by primary site)
- Default target size is now 34Mb (approx. estimate from exome-wide calculation of protein-coding parts of GENCODE)
Added flexibility for variant filtering in tumor-only input callsets
- Added additional options to exclude likely germline variants (both requires the tumor VAF tag to be correctly specified in the input VCF)
  - exclude_likely_hom_germline - removes any variant with an allelic fraction of 1 (100%) - very unlikely somatic event
- exclude_likely_het_germline - removes any variant with
  - an allelic fraction between 0.4 and 0.6, and
  - presence in dbSNP + gnomAD, and
  - no presence as somatic event in COSMIC/TCGA
- Added possibility to input PANEL-OF-NORMALS VCF - this to support the many labs that have sequenced a database/pool of healthy controls. This set of variants are utilized in PCGR to improve the variant filtering when running in tumor-only mode. The PANEL-OF-NORMALS annotation work as follows:
  - all variants in the tumor that coincide with any variant listed in the PANEL-OF-NORMALS VCF is appended with a PANEL_OF_NORMALS flag in the query VCF with tumor variants.
- If configuration parameter exclude_pon is set to True in tumor_only runs, all variants with a PANEL_OF_NORMALS flag are filtered/excluded
For tumor-only runs, added an UpSet plot showing how different filtering sources (gnomAD, 1KG Project, panel-of-normals etc) contribute in the germline filtering procedure
Variants in Tier 3 / Tier 4 / Noncoding are now sorted (and color-coded) according to the target (gene) association score to the cancer phenotype, as provided by the OpenTargets Platform
Added annotation of TCGA’s ten oncogenic signaling pathways
Added EXONIC_STATUS annotation tag (VCF and TSV)
- exonic denotes all protein-altering AND cannonical splicesite altering AND synonymous variants, nonexonic denotes the complement
Added CODING_STATUS annotation tag (VCF and TSV)
- coding denotes all protein-altering AND cannonical splicesite altering, noncoding denotes the complement
Added SYMBOL_ENTREZ annotation tag (VCF)
- Official gene symbol from NCBI EntreZ (SYMBOL provided by VEP can sometimes be non-official/alias (i.e. for GENCODE v19/grch37))
Added SIMPLEREPEATS_HIT annotation tag (VCF and TSV)
- Variant overlaps UCSC simpleRepeat sequence repeat track - used for MSI prediction
Added WINMASKER_HIT annotation tag (VCF and TSV)
- Variant overlaps UCSC windowmaskerSdust sequence repeat track - used for MSI prediction
Added PUTATIVE_DRIVER_MUTATION annotation tag (VCF and TSV)
- Putative cancer driver mutation discovered by multiple approaches from 9,423 tumor exomes in TCGA. Format: symbol:hgvsp:ensembl_transcript_id:discovery_approaches
Added OPENTARGETS_DISEASE_ASSOCS annotation tag (VCF and TSV)
- Associations between protein targets and disease based on multiple lines of evidence (mutations,affected pathways,GWAS, literature etc). Format: CUI:EFO_ID:IS_DIRECT:OVERALL_SCORE
Added OPENTARGETS_TRACTABILITY_COMPOUND annotation tag (VCF and TSV)
- Confidence for the existence of a modulator (small molecule) that interacts with the target (protein) to elicit a desired biological effect
Added OPENTARGTES_TRACTABILITY_ANTIBODY annotation tag (VCF and TSV)
- Confidence for the existence of a modulator (antibody) that interacts with the target (protein) to elicit a desired biological effect
Added CLINVAR_REVIEW_STATUS_STARS annotation tag
- Rating of the ClinVar variant (0-4 stars) with respect to level of review

Changed

Moved from IntoGen’s driver mutation resource to TCGA’s putative driver mutation list in display of driver mutation status
Moved option for vcf_validation from configuration file to run script (--no_vcf_validate)

Removed

Original tier model ‘pcgr’

v0.7.0

Date: 2018-11-27

Fixed

Bug in assignment of variants to tier1/tier2 Issue #61
Missing config option for maf_gnomad_asj in TOML file (also setting operator to <=) Issue #60
Bug in new CancerMine oncogene/tumor suppressor annotation Issue #53
vcfanno fix for empty Description (upgrade to vcfanno v0.3.1 Issue #49)
Bug in message showing too few variants for MSI prediction, Issue #55
Bug in appending of custom VCF tags
- Still unsolved: how to disambiguate identical FORMAT and INFO tags in vcf2tsv
Bug in SCNA value box display for multiple copy number hits (Issue #47)
Bug in vcf2tsv (handling INFO tags encoded with ‘Type = String’, Issue #39)
Bug in search of UniProt functional features (BED feature regions spanning exons are now handled)
Stripped off HTML elements (TCGA_FREQUENCY, DBSNP) in TSV output
Some effect predictions from dbNSFP were not properly parsed (e.g. multiple prediction entries from multiple transcript isoforms), these should now be retrieved correctly
Removed ‘COSM’ prefix in COSMIC mutation links
Bug in retrieval of splice site predictions from dbscSNV

Added

Possibility to run PCGR in a non-Docker environment (e.g. using the –no-docker option). Thanks to an excellent contribution by Vlad Saveliev, Issue #35
- Added possibility to add docker user-id
Possibility for MAF file output (converted with vcf2maf), must be configured by the user in the TOML file (i.e. vcf2maf = true, Issue #17)
Possibility for adding custom VCF INFO tags to PCGR output files (JSON/TSV), must be configured by the user in the TOML file (i.e. custom_tags)
Added MUTATION_HOTSPOT_CANCERTYPE in data tables (i.e. listing tumor types in which hotspot mutations have been found)
Included the ‘rs’ prefix for dbSNP identifiers (HTML and TSV output)
Individual entries/columns for variant effect predictions:
- Individual algorithms: SIFT_DBNSFP, M_CAP_DBNSFP, MUTPRED_DBNSFP, MUTATIONTASTER_DBNSFP, MUTATIONASSESSOR_DBNSFP, FATHMM_DBNSFP, FATHMM_MKL_DBNSFP, PROVEAN_DBNSFP
- Ensemble predictions (META_LR_DBNSFP), dbscSNV splice site predictions (SPLICE_SITE_RF_DBNSFP, SPLICE_SITE_ADA_DBNSFP)
Upgraded samtools to v1.9 (makes vcf2maf work properly)
Added Ensembl gene/transcript id and corresponding RefSeq mRNA id to TSV/JSON
Added for future implementation:
- SeqKat + karyoploteR for exploration of kataegis/hypermutation
- CELLector - genomics-guided selection of cancer cell lines
Upgraded VEP to v94

Changed

Changed CANCER_MUTATION_HOTSPOT to MUTATION_HOTSPOT
Moved from TSGene 2.0 to CancerMine for annotation of tumor suppressor genes and proto-oncogenes
- A minimum of n=3 citations were required to include literatured-mined tumor suppressor genes and proto-oncogenes from CancerMine

v0.6.2.1

Date: 2018-05-14

Fixed

Bug in copy number annotation (broad/focal)

v0.6.2

Date: 2018-05-09

Fixed

Bug in copy number segment display (missing variable initalization, Issue #34))
Typo in gnomAD filter statistic (fraction, Issue #31)
Bug in mutational signature analysis for grch38 (forgot to pass BSgenome object, Issue #27)
Missing proper ASCII-encoding in vcf2tsv conversion, Issue #
Removed ‘Noncoding mutations’ section when no input VCF is present
Bug in annotation of copy number event type (focal/broad)
Bug in copy number annotation (missing protein-coding transcripts)
Updated MSI prediction (variable importance, performance measures)

Added

Genome assembly is appended to every output file
Issue warning for copy number segment that goes beyond chromosomal lengths of specified assembly (segments will be skipped)
Added missing subtypes for ‘Skin_Cancer_NOS’ in the cancer phenotype dataset

v0.6.1

Date: 2018-05-02

Fixed

Bug in tier assignment ‘pcgr_acmg’ (case for no variants in tier1,2,3)
Bug in tier assignment ‘pcgr_acmg’ (no tumor type specified, evidence items with weak support detected)
Bug: duplicated variants in ‘Tier 3’ resulting from genes encoded with dual roles as tumor suppressor genes/oncogenes
Bug: duplicated variants in ‘Tier 1/Noncoding variants’ resulting from rare cases of noncoding variants occurring in Tier 1 (synonymous variants with biomarker role)

v0.6.0

Date: 2018-04-25

Added

New argument in pcgr.py
- assembly (grch37/grch38)
New option in pcgr.py
- –basic - run comprehensive VCF annotation only, skip report generation and additional analyses
New sections in HTML report
- Settings and annotation sources - now also listing key PCGR configuration settings
- Main findings - Six value boxes indicating the main findings of clinical relevance
New configuration options
- tier_model(string) - choice between pcgr_acmg and pcgr
- mutational_burden - set TMB tertile limits
  - tmb_low_limit (float)
  - tmb_intermediate_limit (float)
- tumor_type - choose between 34 tumor types/classes:
  - Adrenal_Gland_Cancer_NOS (logical)
  - Ampullary_Carcinoma_NOS (logical)
  - Biliary_Tract_Cancer_NOS (logical)
  - Bladder_Urinary_Tract_Cancer_NOS (logical)
  - Blood_Cancer_NOS (logical)
  - Bone_Cancer_NOS (logical)
  - Breast_Cancer_NOS (logical)
  - CNS_Brain_Cancer_NOS (logical)
  - Colorectal_Cancer_NOS (logical)
  - Cervical_Cancer_NOS (logical)
  - Esophageal_Stomach_Cancer_NOS (logical)
  - Head_And_Neck_Cancer_NOS (logical)
  - Hereditary_Cancer_NOS (logical)
  - Kidney_Cancer_NOS (logical)
  - Leukemia_NOS (logical)
  - Liver_Cancer_NOS (logical)
  - Lung_Cancer_NOS (logical)
  - Lymphoma_Hodgkin_NOS (logical)
  - Lymphoma_Non_Hodgkin_NOS (logical)
  - Ovarian_Fallopian_Tube_Cancer_NOS (logical)
  - Pancreatic_Cancer_NOS (logical)
  - Penile_Cancer_NOS (logical)
  - Peripheral_Nervous_System_Cancer_NOS (logical)
  - Peritoneal_Cancer_NOS (logical)
  - Pleural_Cancer_NOS (logical)
  - Prostate_Cancer_NOS (logical)
  - Skin_Cancer_NOS (logical)
  - Soft_Tissue_Cancer_NOS (logical)
  - Stomach_Cancer_NOS (logical)
  - Testicular_Cancer_NOS (logical)
  - Thymic_Cancer_NOS (logical)
  - Thyroid_Cancer_NOS (logical)
  - Uterine_Cancer_NOS (logical)
  - Vulvar_Vaginal_Cancer_NOS (logical)
- mutational_signatures
  - mutsignatures_cutoff (float) - discard any signature contributions with a weight less than the cutoff
- cna
  - transcript_cna_overlap (float) - minimum percent overlap between copy number segment and transcripts (average) for tumor suppressor gene/proto-oncogene to be reported
- allelic_support
  - If input VCF has correctly formatted depth/allelic fraction as INFO tags, users can add thresholds on depth/support that are applied prior to report generation
    - tumor_dp_min (integer) - minimum sequencing depth for variant in tumor sample
    - tumor_af_min (float) - minimum allelic fraction for variant in tumor sample
    - normal_dp_min (integer) - minimum sequencing depth for variant in normal sample
    - normal_af_max (float) - maximum allelic fraction for variant in normal sample
- visual
  - report_theme (string) - visual theme of report (Bootstrap)
- other
  - vcf_validation (logical) - keep/skip VCF validation by vcf-validator
New output file - JSON output of HTML report content
New INFO tags of PCGR-annotated VCF
- CANCER_PREDISPOSITION
- PFAM_DOMAIN
- TCGA_FREQUENCY
- TCGA_PANCANCER_COUNT
- ICGC_PCAWG_OCCURRENCE
- ICGC_PCAWG_AFFECTED_DONORS
- CLINVAR_MEDGEN_CUI
New column entries in annotated SNV/InDel TSV file:
- CANCER_PREDISPOSITION
- ICGC_PCAWG_OCCURRENCE
- TCGA_FREQUENCY
New column in CNA output
- TRANSCRIPTS - aberration-overlapping transcripts (Ensembl transcript IDs)
- MEAN_TRANSCRIPT_CNA_OVERLAP - Mean overlap (%) betweeen gene transcripts and aberration segment

Removed

Elements of databundle (now annotated directly through VEP):
- dbsnp
- gnomad/exac
- 1000G project
INFO tags of PCGR-annotated VCF
- DBSNPBUILDID
- DBSNP_VALIDATION
- DBSNP_SUBMISSIONS
- DBSNP_MAPPINGSTATUS
- GWAS_CATALOG_PMID
- GWAS_CATALOG_TRAIT_URI
- DOCM_DISEASE
Output files
- TSV files with mutational signature results and biomarkers (i.e. sample_id.pcgr.snvs_indels.biomarkers.tsv and sample_id.pcgr.mutational_signatures.tsv)
  - Data can still be retrieved - now from the JSON dump
- MAF file
  - The previous MAF output was generated in a custom fashion, a more accurate MAF output based on https://github.com/mskcc/vcf2maf will be incorporated in the next release

Changed

HTML report sections
- Tier statistics and Variant statistics are now grouped into the section Tier and variant statistics
- Tier 5 is now Noncoding mutations (i.e. not considered a tier per se)
- Sliders for allelic fraction in the Global variant browser are now fixed from 0 to 1 (0.05 intervals)