Skip to contents

v2.1.2

  • Date: 2024-10-21

  • Highlight also VUS variants (ClinVar-classified) in the variant oncogenicity section of the HTML report

v2.1.1

  • Date: 2024-10-11

  • Fix bug in parsing of relative CDS position (issue252)

  • Move some code in the qmd templates to pcgrr functions

v2.1.0

  • Date: 2024-09-27
  • Major data updates
    • ClinVar (2024-09)
    • dbNSFP (v4.8)
    • NCI Thesaurus v24.07e
    • CIViC (2024-09-18)
  • Reduction (~15%) in overall data bundle size - removed unused data files (e.g. expression counts)
  • Fixed bug in MAF output for tumor-only runs issue250, also ensure that non-exonic variants are excluded if setting --exclude_nonexonic is used)
  • Fixed bug in annotation of splice site mutation hotspots (e.g. MET exon 14 skipping)
  • Highlighted variants with known pathogenic/likely pathogenic clinical significance in ClinVar (regardless of phenotype and variant origin) in the variant oncogenicity section of the HTML report
  • Created interactive visualization support for allele-specific copy number data (HTML report)
  • Slight change to the default transcript consequence pick order in VEP based on observations of prioritized transcripts (mane_select > mane_plus_clinical > canonical > biotype > ccds > rank > tsl > appris >length)
  • Pulled in known oncogenic variants from ClinVar (assessed through ClinGen/CGC/VICC SOP, oncogenic/likely oncogenic) into the variant oncogenicity assessment algorithm
  • Added option --no_html to disable HTML report generation
  • Added option --input_cpsr - re-offering the possibility to integrate CPSR-classified germline variants in the PCGR HTML report
  • Added HGVSc_RefSeq as output column in TSV/HTML - using MANE Select RefSeq transcript identifiers (works primarily for grch38)
  • Pulled in coding sequence start annotation for protein-coding transcripts from GENCODE, enabling a more useful annotation of promoter variants (e.g. TERT)
  • Created new column ALTERATION in variant tables of HTML report, a combination of HGVSp, HGVSc (if HGVSp not available)
  • New output file for tumor-only runs, the complete set of calls, filtered and unfiltered, in a TSV file
  • Re-processed all RNA-seq reference cohorts (TCGA, DepMap, TreeHouse), ensuring that all cohorts are using the same unit (log2(TPM+0.001))
  • Separated outlier gene expression results into separate tabs in the HTML report, added them to Excel workbook output
  • Added section on kataegis events in the HTML report
  • Fixed bug in plotting of reference TMB distributions for different TMB algorithms (--tmb_display option)

v2.0.3

  • Date: 2024-08-01
  • Ensure correct propagation of purity/ploidy in output report
  • Ensure that MAF output is properly filtered for tumor-only runs
  • Ensure properly copying of quarto templates (abandon file.copy), both for PCGR and CPSR

v2.0.2

  • Date: 2024-07-16
  • Ensure correct reference to variant actionability guidelines - AMP/ASCO/CAP (not ACMG/AMP), both in code and in docs (thanks to HomoPolyethylen for pointing this out)
  • fix bug in missing assignment of tier 3 variants (AMP/ASCO/CAP)
  • ensure non-exonic biomarker variants (e.g. TERT) are written to Excel sheet
  • specify (value boxes, plots) that MSI classification is based on coding variants

v2.0.1

  • Date: 2024-07-07
  • Fixed bug for chrM variants in input - not properly annotated by VEP, and not correctly processed in pcgrr. Any mitochondrial variants found in input VCF are now removed during VCF pre-processing.

v2.0.0

  • Date: 2024-06-26
  • Major data updates
    • ClinVar (2024-06)
    • NCI Thesaurus v24.05d
    • Open Targets Platform v2024.06
    • CIViC (2024-06-21)
    • CGI Cancer Biomarkers database (2022/10/17)
    • GENCODE v46/v19 (GRCh38/GRCh37)
    • Cancer Gene Census
    • CancerMine v50 (2023-03)
    • Pfam v35.0 (November 2021)
    • Disease Ontology/EFO
    • UniProt KB v2024_03
  • Major software updates
    • Ensembl VEP v112
  • Diff between v2.0.0 and v1.4.1

Added/changed

  • New report generation framework - quarto
    • multiple options related to RMarkdown output are now deprecated
  • Re-organized data bundle structure
    • Users need to download an assembly-specific VEP cache separately from the Ensembl VEP website, and provide its path to the new required argument --vep_dir in the pcgr command
  • Re-engineered data bundle generation pipeline
  • Improved data bundle documentation
    • An HTML report with an overview of the contents of the data bundle is shipped with the reference data itself, also available here (grch38).
  • Singularity/Apptainer support
  • Moved more of the code base to initial Python workflow steps (biomarker matching, CNA segment annotation, RNA expression analysis, oncogenicity classification)
  • Variants are now classified with respect to both oncogenicity and actionability, and the previous global tier classification (tier 1-5) is thus deprecated
  • New copy number input format - allele-specific (chrom, start, end, n_major, n_minor)
    • New argument n_copy_gain - Minimum number of total copy number for segments considered as gains/amplifications (default: 6)
  • RNA-bulk expression input permitted in the pcgr command
    • --input_rna_expression - accepts a TSV file with gene expression values
    • --expression_sim - boolean flag to enable expression similarity analysis
    • --expression_sim_db - Comma-separated string of databases for used in RNA expression similarity analysis, default: tcga,depmap,treehouse
  • TMB calculations can be adjusted using several parameters:
    • --tmb_display - Type of TMB measure to show in report (coding_and_silent, coding_non_silent, missense_only)
    • --tmb_dp_min - Minimum depth for a position to be considered for TMB calculation (default: 0) - requires allelic support information from VCF
    • --tmb_af_min - Minimum allele frequency for a position to be considered for TMB calculation (default: 0) - requires allelic support information from VCF
  • A multi-sheet Excel workbook output with analysis output is provided, suitable e.g. for aggregation of results across samples
  • argument name changes to pcgr:
    • --pcgr_dir renamed to --refdata_dir
    • --clinvar_ignore_noncancer renamed to --clinvar_report_noncancer, meaning that variants found in ClinVar, yet attributed to non-cancer related phenotypes, are now excluded from reporting by default
    • --vep_gencode_all renamed to --vep_gencode_basic, meaning that the gene variant annotation is now using all GENCODE transcripts by default, not only the basic set
    • --preserved_info_tags renamed to --retained_info_tags
    • --basic renamed to --no_reporting
    • --target_size_mb renamed to --effective_target_size_mb
  • LOFTEE plugin in VEP removed as loss-of-function variant classifier (due to low level of maintenance, and outdated dependency requirements). For now, a simplified LoF-annotation is used as a replacement, looking primarily at CSQ types (stop_gained, frameshift_variant, splice_acceptor_variant, splice_donor_variant). Furthermore, frameshift/stop-gain variants that are found within the last 5% of the coding sequence length are deemed non-LOF, as are splice donor variants not disrupting the canonical site (GC>GT). An even more advanced LoF-annotation is planned for a future release.
  • Biomarkers are matched much more comprehensively than in previous versions, matching at the genomic level, codon, exon, amino acid and gene level (both principal and non-principal transcript matches)

Removed

  • Options for configuring RMarkdown output, i.e. --report_theme, report_nonfloating_toc
  • --cpsr_report and --include_trials, which can provide the report with associated pathogenic germline variants (from CPSR) and potential clinical trial oppertunities is currenly on hold for a forthcoming release
  • --no_vcf_validate - VCF validation is simplified, not relying on vcf-validator anymore
  • Options to filter tumor-only calls using 1000 Genomes Project database, i.e. --maf_onekg_eur, --maf_onekg_amr, --maf_onekg_eas, --maf_onekg_afr, --maf_onekg_sas, --maf_onekg_global
  • --cell_line
  • --logr_gain, and --logr_homdel

v1.4.1

Changes


v1.4.0

Changes


v1.3.0

Changes

  • pcgr_summarise.py: proritize protein-coding BIOTYPE csq (pr201)
  • cpsr.py: expose --pcgrr_conda option to flexibly activate pcgrr env by a non-default pcgrr name
  • docs: update input.Rmd, running.Rmd
  • cpsr_validate_input.py: refactor for efficient custom gene egrep
  • code reformat via autopep8 for annoutils.py, pcgr_vcfanno.py
  • GitHub Actions:
    • bump docker actions setup-buildx-action (v1–v2), build-push-action (v2–v4)
    • use miniforge-variant instead of mamba-version: "*"
    • replace ::set-output since deprecated

v1.2.0

Changes

  • Keep only autosomal, X, Y, M/MT chromosomes
  • Import bcftools as dependency

v1.1.0

Changes

  • Remove Docker command wrappers (note: this does not remove the Docker functionality from PCGR; instead it removes the legacy wrappers that were created in the original PCGR version). This along with a lot of other general changes are summarised in pr193. Of note:
    • --no_docker and --docker_uid CLI arguments are now obsolete.
    • --version CLI argument added for pcgr/cpsr.py
    • declutter repetitive log messages
    • refactor pcgr/cpsr.py script
  • Update documentation and declutter logging; refactor dict creation (pr192).
  • Minor refactor (pr194):
    • switch to using Python’s native os.remove and os.rename for glob cleanup
    • keep decompressed VCF only if --vcf2maf option is specified. The vcf2maf tool does not support compressed VCFs - see issue235.
  • Fix for CLI argument --cna_overlap_pct pr196.

New Contributors


v1.0.3

  • Date: 2022-05-24
Fixed
  • Bug in clinical trials sorting, pr191

v1.0.2

  • Date: 2022-03-30
Fixed

v1.0.1

  • Date: 2022-03-09
Fixed
  • Writing to JSON crashes when size of input VCF is huge (variants in the order of millions). If raw input set (VCF) contains > 500,000 variants, this set will, prior to reporting, be reduced by
      1. exclusion of intergenic and intronic variants, and
      1. exclusion of upstream_gene/downstream_gene variants (if variant set is still above 500,000 after step A)
  • Bug in signature analysis (issue187) for cases where the input variant set fits to > 18 different aetiologies.

v1.0.0

  • Date: 2022-02-25

  • Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine, KEGG, ChEMBL, Open Targets Platform, Disease Ontology, Experimental Factor Ontology

Added
  • Command-line options
    • VEP options
      • --vep_gencode_all - use all GENCODE transcripts during VEP annotation (not only the basic GENCODE set)
      • --prevalence_reference_signatures - set minimum prevalence (percent) for selection of reference signatures included in refitting procedure for a given tumor type
Changed
  • Complete restructure of Python and R components.Installation now relies on two separate conda packages, pcgr (Python component) and pcgrr (R component). Direct Docker support remains, with the Dockerfile simplified to rely exclusively on the installation of the above Conda packages.
Removed
  • VCF validation step. Feedback from users suggested that Ensembl’s vcf-validator was often too stringent so its use has been deprecated. The --no_vcf_validate option remains for backwards compatibility.

v0.9.2

  • Date: 2021-06-30

  • Data updates: ClinVar, GWAS catalog, CIViC, CancerMine, dbNSFP, KEGG, ChEMBL, Disease Ontology/EFO, Open Targets Platform, UniProt KB, GENCODE

  • Software upgrades: R v4.1, Bioconductor v3.13, VEP (104) ++

Changed
  • TOML-based configuration for PCGR is abandoned, all options to PCGR are now configured through command-line parameters
    • NOTE: We recommend to turn on --show_noncoding and --vcf2maf (prevously turned on by default in TOML). For tumor-only runs, we recommend to include --exclude_dbsnp_nonsomatic and exclude_nonexonic
Added
  • Command-line options
    • Previously set in TOML file)
      • Allelic support
        • --tumor_dp_tag
        • --tumor_af_tag
        • --control_dp_tag
        • --control_af_tag
        • --call_conf_tag
      • Tumor-only options
        • --maf_onekg_eur
        • --maf_onekg_amr
        • --maf_onekg_afr
        • --maf_onekg_eas
        • --maf_onekg_sas
        • --maf_onekg_global
        • --maf_gnomad_nfe
        • --maf_gnomad_asj
        • --maf_gnomad_fin
        • --maf_gnomad_oth
        • --maf_gnomad_amr
        • --maf_gnomad_afr
        • --maf_gnomad_eas
        • --maf_gnomad_sas
        • --maf_gnomad_global
        • --exclude_pon
        • --exclude_likely_het_germline
        • --exclude_likely_hom_germline
        • --exclude_dbsnp_nonsomatic
        • --exclude_nonexonic
      • --report_theme
      • --preserved_info_tags (previously custom_tags (TOML))
      • --show_noncoding (previously list_noncoding (TOML))
      • --vcfanno_n_proc (previously n_vcfanno_proc (TOML))
      • --vep_n_forks (previously n_vep_forks (TOML))
      • --vep_pick_order
      • --vep_no_intergenic (previously vep_skip_intergenic (TOML))
      • --vcf2maf
    • New options
      • --report_nonfloating_toc (NEW) - add the TOC at the top of the HTML report, not floating at the left of the document
      • --cpsr_report (NEW) - add a dedicated section in PCGR with main germline findings from CPSR analysis - (use the gzipped JSON output from CPSR as input)
      • --vep_regulatory (NEW) - append regulatory annotations to variants (TF binding sites etc.)
      • --include_artefact_signatures (NEW) - include sequencing artefacts in the reference collection of mutational signatures (COSMIC v3.2)
Fixed
  • Bug in writing (large) report contents to JSON (issue #118)
  • Bug (typo) in merge of clinical evidence items from different sources (CIVIC + CGI) (issue #126)
  • Bug in value box for number of (high-confident) kataegis events - rmarkdown (issue #122)
  • Bug in value box for tumor purity/ploidy -rmarkdown (issue #129)
Removed
  • Command-line options
    • --conf - TOML-based configuration file

v0.9.1

  • Date: 2020-11-30

  • Data updates:

    • ClinVar,
    • GWAS catalog
    • CIViC
    • CancerMine
    • dbNSFP
    • KEGG
    • ChEMBL/DGIdb
    • Disease Ontology, Experimental Factor Ontology
Added
  • added possibility to configure algorithm for TMB calculation, optional argument tmb_algorithm - all coding variants (all_coding) or non-synonymous variants only (nonsyn)
  • R code subject to static analysis with lintr
  • Improved Conda recipe (i.e. meta.yaml) with version pinning of all package dependencies
Changed
  • Removed DisGeNET annotations from output (associations from Open Targets Platform serve same purpose)
  • Version pinning of software dependencies in Dockerfile:
    • All R packages necessary for PCGR is installed using the renv framework, ensuring improved versioning and reproducibility
    • Other tools/utilities and Python libraries that have been version pinned:
      • bedtools, samtools, numpy, cython, scipy, cyvcf2, toml, pandas

v0.9.0rc

  • Date: 2020-09-24

  • Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine, UniProt KB, dbNSFP, Pfam, KEGG, Open Targets Platform

  • Software updates: VEP 101

Fixed
  • An extra comma was mistakenly present in the template for tier 2 variants, issue #96
  • Missing protein domain annotations for grch38, issue #116
Changed
  • All arguments to pcgr.py is now non-positional
  • Arguments to pcgr.py are divided into two groups: required and optional
  • Options allelic_support:tumor_dp_min, allelic_support:tumor_af_min, allelic_support:control_dp_min, allelic_support:control_af_max in PCGR configuration file are now optional arguments --tumor_dp_min, --tumor_af_min, --control_dp_min, –control_af_maxincpsr.py`
  • Option mutational_burden:mutational_burden in PCGR configuration file is now optional argument --estimate_tmb in pcgr.py
  • Option msi:msi in PCGR configuration file is now optional argument --estimate_msi_status in pcgr.py
  • Option mutational_signatures:mutational_signatures in PCGR configuration file is now optional argument --estimate_signatures in pcgr.py
  • Options mutational_signatures:mutsignatures_signature_limit, mutational_signatures:mutsignatures_normalization, mutational_signatures:mutsignatures_mutation_limit, mutational_signatures:mutsignatures_cutoff are removed (used for deconstructSigs analysis, which is no longer in use)
  • Optional argument --cna_overlap_pct in pcgr.py replaces cna:cna_overlap_pct in PCGR configuration file
  • Optional argument --logr_gain in pcgr.py replaces cna:logr_gain in PCGR configuration file
  • Optional argument --logr_homdel in pcgr.py replaces cna:logr_homdel in PCGR configuration file
  • Removed mutational_burden:tmb_low_limit and mutational_burden:tmb_intermediate_limit - TMB is no longer interpreted in the context of thresholds
  • Classifications of genes as tumor suppressors/oncogenes are now based on a combination of CancerMine citation count and presence in Network of Cancer Genes
  • Settings section of report is now divived into three:
    • Metadata - sample and sequencing assay
    • Report configuration
Added
  • Optional argument --include_trials in pcgr.py - includes a section with annotated clinical trials for the tumor type in question
  • Optional argument --assay in pcgr.py - designates type of sequencing assay
  • Optional argument --cell_line in pcgr.py - designates runs of tumor cell lines (only for display, not used to configure any analysis)
  • Optional argument --min_mutations_signatures in pcgr.py - minimum number of required mutations for mutational signature analysis with MutationalPatterns
  • Optional argument --all_reference_signatures in pcgr.py - considers all reference signatures during fitting of mutational profile to known signatures
  • Optional argument --estimate_signatures now also includes detection of potential kataegis events (WGS/WES assays only), and rainfall plot in the flexdashboard output
  • The user can now distinguish (through color codes) whether a biomarker has been mapped exactly (nucleotide change) or at a regional level (codon/exon)
  • All variant-associated biomarkers (regardless of assignment to TIER 1/2) are now found in a new section (SNVs/InDels)
  • For copy number amplifications, other putative drug targets in cancer are listed in a new section
  • Detailed documentation of report contents are added to the Documentation section
  • References are updated and all provided with DOI

v0.8.4

  • Date: 2019-11-18

  • Data updates: ClinVar, CIViC, CancerMine, UniProt KB

  • Software updates: VEP 98.3

v0.8.3

  • Date: 2019-10-14

  • Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine

  • Software updates: VEP 98.2, vcf2tsv

Fixed
  • More improved mapping between Ensembl transcripts and UniProt accessions (using also RefSeq accessions where available)
Added
  • Possibility to filter evidence items by RATING in interactive data tables
Changed
  • Option target_size_mb in pcgr.py replaces target_size_mb in configuration file, more convenient in terms of configuring runs
  • Option tumor_type in pcgr.py replaces tumor_type in configuration file

v0.8.2

  • Date: 2019-09-29

  • Data updates: ClinVar, GWAS catalog, GENCODE, DiseaseOntology, CIViC, CancerMine, UniProt KB

  • Software updates: VEP 97.3, vcfanno 0.3.2, LOFTEE (VEP plugin) 1.0.3

Fixed
  • Bug in concatenation of clinical evidence items from different sources (CIVIC + CBMDB) (issues #83,#87)
  • Silent variants that coincide with biomarkers reported at codon level are ignored
  • Distinction between clinical evidence items of different origins (somatic + germline)
  • Improved mapping between Ensembl transcripts and UniProt accessions (using also RefSeq accessions where available)
  • Bug in UpSetPlot for cases where filtering produce less than two intersecting sets
Added
  • New field ‘mane’ as criteria for pick order in configuration file (VEP)
  • Sample identifier to copy number annotation output (convenient for concatenation of output from multiple samples)
  • Capturing allelic depth (t_depth, t_ref_count etc.) in vcf2maf output (enhancement #52)
  • Option tumor_only in pcgr.py, replaces vcf_tumor_only in configuration file, more convenient in terms of configuration

v0.8.1

  • Date: 2019-05-22
Added
  • Cancer_NOS.toml as configuration file for unspecified tumor types

v0.8.0

  • Date: 2019-05-20
Fixed
  • Bug in value box for Tier 2 variants (new line carriage) Issue #73
Added
  • Upgraded VEP to v96
    • Skipping the –regulatory VEP option to avoid forking issues and to improve speed (See this issue)
    • Added option to configure pick-order for choice of primary transcript in configuration file
  • Pre-made configuration files for each tumor type in conf folder
  • Possibility to append a CNA plot file (.png format) to the section of the report with Somatic CNAs previous feature request
  • Added possibility to input estimates of tumor purity and ploidy
    • shown as value boxes in Main results
  • Tumor mutational burden is now compared with the distribution of TMB observed for TCGA’s cohorts (organized by primary site)
    • Default target size is now 34Mb (approx. estimate from exome-wide calculation of protein-coding parts of GENCODE)
  • Added flexibility for variant filtering in tumor-only input callsets
    • Added additional options to exclude likely germline variants (both requires the tumor VAF tag to be correctly specified in the input VCF)
      • exclude_likely_hom_germline - removes any variant with an allelic fraction of 1 (100%) - very unlikely somatic event
    • exclude_likely_het_germline - removes any variant with
      • an allelic fraction between 0.4 and 0.6, and
      • presence in dbSNP + gnomAD, and
      • no presence as somatic event in COSMIC/TCGA
    • Added possibility to input PANEL-OF-NORMALS VCF - this to support the many labs that have sequenced a database/pool of healthy controls. This set of variants are utilized in PCGR to improve the variant filtering when running in tumor-only mode. The PANEL-OF-NORMALS annotation work as follows:
      • all variants in the tumor that coincide with any variant listed in the PANEL-OF-NORMALS VCF is appended with a PANEL_OF_NORMALS flag in the query VCF with tumor variants.
    • If configuration parameter exclude_pon is set to True in tumor_only runs, all variants with a PANEL_OF_NORMALS flag are filtered/excluded
  • For tumor-only runs, added an UpSet plot showing how different filtering sources (gnomAD, 1KG Project, panel-of-normals etc) contribute in the germline filtering procedure
  • Variants in Tier 3 / Tier 4 / Noncoding are now sorted (and color-coded) according to the target (gene) association score to the cancer phenotype, as provided by the OpenTargets Platform
  • Added annotation of TCGA’s ten oncogenic signaling pathways
  • Added EXONIC_STATUS annotation tag (VCF and TSV)
    • exonic denotes all protein-altering AND cannonical splicesite altering AND synonymous variants, nonexonic denotes the complement
  • Added CODING_STATUS annotation tag (VCF and TSV)
    • coding denotes all protein-altering AND cannonical splicesite altering, noncoding denotes the complement
  • Added SYMBOL_ENTREZ annotation tag (VCF)
    • Official gene symbol from NCBI EntreZ (SYMBOL provided by VEP can sometimes be non-official/alias (i.e. for GENCODE v19/grch37))
  • Added SIMPLEREPEATS_HIT annotation tag (VCF and TSV)
    • Variant overlaps UCSC simpleRepeat sequence repeat track - used for MSI prediction
  • Added WINMASKER_HIT annotation tag (VCF and TSV)
    • Variant overlaps UCSC windowmaskerSdust sequence repeat track - used for MSI prediction
  • Added PUTATIVE_DRIVER_MUTATION annotation tag (VCF and TSV)
    • Putative cancer driver mutation discovered by multiple approaches from 9,423 tumor exomes in TCGA. Format: symbol:hgvsp:ensembl_transcript_id:discovery_approaches
  • Added OPENTARGETS_DISEASE_ASSOCS annotation tag (VCF and TSV)
    • Associations between protein targets and disease based on multiple lines of evidence (mutations,affected pathways,GWAS, literature etc). Format: CUI:EFO_ID:IS_DIRECT:OVERALL_SCORE
  • Added OPENTARGETS_TRACTABILITY_COMPOUND annotation tag (VCF and TSV)
    • Confidence for the existence of a modulator (small molecule) that interacts with the target (protein) to elicit a desired biological effect
  • Added OPENTARGTES_TRACTABILITY_ANTIBODY annotation tag (VCF and TSV)
    • Confidence for the existence of a modulator (antibody) that interacts with the target (protein) to elicit a desired biological effect
  • Added CLINVAR_REVIEW_STATUS_STARS annotation tag
    • Rating of the ClinVar variant (0-4 stars) with respect to level of review
Changed
Removed
  • Original tier model ‘pcgr’

v0.7.0

  • Date: 2018-11-27
Fixed
  • Bug in assignment of variants to tier1/tier2 Issue #61
  • Missing config option for maf_gnomad_asj in TOML file (also setting operator to <=) Issue #60
  • Bug in new CancerMine oncogene/tumor suppressor annotation Issue #53
  • vcfanno fix for empty Description (upgrade to vcfanno v0.3.1 Issue #49)
  • Bug in message showing too few variants for MSI prediction, Issue #55
  • Bug in appending of custom VCF tags
    • Still unsolved: how to disambiguate identical FORMAT and INFO tags in vcf2tsv
  • Bug in SCNA value box display for multiple copy number hits (Issue #47)
  • Bug in vcf2tsv (handling INFO tags encoded with ‘Type = String’, Issue #39)
  • Bug in search of UniProt functional features (BED feature regions spanning exons are now handled)
  • Stripped off HTML elements (TCGA_FREQUENCY, DBSNP) in TSV output
  • Some effect predictions from dbNSFP were not properly parsed (e.g. multiple prediction entries from multiple transcript isoforms), these should now be retrieved correctly
  • Removed ‘COSM’ prefix in COSMIC mutation links
  • Bug in retrieval of splice site predictions from dbscSNV
Added
  • Possibility to run PCGR in a non-Docker environment (e.g. using the –no-docker option). Thanks to an excellent contribution by Vlad Saveliev, Issue #35
    • Added possibility to add docker user-id
  • Possibility for MAF file output (converted with vcf2maf), must be configured by the user in the TOML file (i.e. vcf2maf = true, Issue #17)
  • Possibility for adding custom VCF INFO tags to PCGR output files (JSON/TSV), must be configured by the user in the TOML file (i.e. custom_tags)
  • Added MUTATION_HOTSPOT_CANCERTYPE in data tables (i.e. listing tumor types in which hotspot mutations have been found)
  • Included the ‘rs’ prefix for dbSNP identifiers (HTML and TSV output)
  • Individual entries/columns for variant effect predictions:
    • Individual algorithms: SIFT_DBNSFP, M_CAP_DBNSFP, MUTPRED_DBNSFP, MUTATIONTASTER_DBNSFP, MUTATIONASSESSOR_DBNSFP, FATHMM_DBNSFP, FATHMM_MKL_DBNSFP, PROVEAN_DBNSFP
    • Ensemble predictions (META_LR_DBNSFP), dbscSNV splice site predictions (SPLICE_SITE_RF_DBNSFP, SPLICE_SITE_ADA_DBNSFP)
  • Upgraded samtools to v1.9 (makes vcf2maf work properly)
  • Added Ensembl gene/transcript id and corresponding RefSeq mRNA id to TSV/JSON
  • Added for future implementation:
    • SeqKat + karyoploteR for exploration of kataegis/hypermutation
    • CELLector - genomics-guided selection of cancer cell lines
  • Upgraded VEP to v94
Changed
  • Changed CANCER_MUTATION_HOTSPOT to MUTATION_HOTSPOT
  • Moved from TSGene 2.0 to CancerMine for annotation of tumor suppressor genes and proto-oncogenes
    • A minimum of n=3 citations were required to include literatured-mined tumor suppressor genes and proto-oncogenes from CancerMine

v0.6.2.1

  • Date: 2018-05-14
Fixed
  • Bug in copy number annotation (broad/focal)

v0.6.2

  • Date: 2018-05-09
Fixed
  • Bug in copy number segment display (missing variable initalization, Issue #34))
  • Typo in gnomAD filter statistic (fraction, Issue #31)
  • Bug in mutational signature analysis for grch38 (forgot to pass BSgenome object, Issue #27)
  • Missing proper ASCII-encoding in vcf2tsv conversion, Issue #
  • Removed ‘Noncoding mutations’ section when no input VCF is present
  • Bug in annotation of copy number event type (focal/broad)
  • Bug in copy number annotation (missing protein-coding transcripts)
  • Updated MSI prediction (variable importance, performance measures)
Added
  • Genome assembly is appended to every output file
  • Issue warning for copy number segment that goes beyond chromosomal lengths of specified assembly (segments will be skipped)
  • Added missing subtypes for ‘Skin_Cancer_NOS’ in the cancer phenotype dataset

v0.6.1

  • Date: 2018-05-02
Fixed
  • Bug in tier assignment ‘pcgr_acmg’ (case for no variants in tier1,2,3)
  • Bug in tier assignment ‘pcgr_acmg’ (no tumor type specified, evidence items with weak support detected)
  • Bug: duplicated variants in ‘Tier 3’ resulting from genes encoded with dual roles as tumor suppressor genes/oncogenes
  • Bug: duplicated variants in ‘Tier 1/Noncoding variants’ resulting from rare cases of noncoding variants occurring in Tier 1 (synonymous variants with biomarker role)

v0.6.0

  • Date: 2018-04-25
Added
  • New argument in pcgr.py
    • assembly (grch37/grch38)
  • New option in pcgr.py
    • –basic - run comprehensive VCF annotation only, skip report generation and additional analyses
  • New sections in HTML report
    • Settings and annotation sources - now also listing key PCGR configuration settings
    • Main findings - Six value boxes indicating the main findings of clinical relevance
  • New configuration options
    • tier_model(string) - choice between pcgr_acmg and pcgr
    • mutational_burden - set TMB tertile limits
      • tmb_low_limit (float)
      • tmb_intermediate_limit (float)
    • tumor_type - choose between 34 tumor types/classes:
      • Adrenal_Gland_Cancer_NOS (logical)
      • Ampullary_Carcinoma_NOS (logical)
      • Biliary_Tract_Cancer_NOS (logical)
      • Bladder_Urinary_Tract_Cancer_NOS (logical)
      • Blood_Cancer_NOS (logical)
      • Bone_Cancer_NOS (logical)
      • Breast_Cancer_NOS (logical)
      • CNS_Brain_Cancer_NOS (logical)
      • Colorectal_Cancer_NOS (logical)
      • Cervical_Cancer_NOS (logical)
      • Esophageal_Stomach_Cancer_NOS (logical)
      • Head_And_Neck_Cancer_NOS (logical)
      • Hereditary_Cancer_NOS (logical)
      • Kidney_Cancer_NOS (logical)
      • Leukemia_NOS (logical)
      • Liver_Cancer_NOS (logical)
      • Lung_Cancer_NOS (logical)
      • Lymphoma_Hodgkin_NOS (logical)
      • Lymphoma_Non_Hodgkin_NOS (logical)
      • Ovarian_Fallopian_Tube_Cancer_NOS (logical)
      • Pancreatic_Cancer_NOS (logical)
      • Penile_Cancer_NOS (logical)
      • Peripheral_Nervous_System_Cancer_NOS (logical)
      • Peritoneal_Cancer_NOS (logical)
      • Pleural_Cancer_NOS (logical)
      • Prostate_Cancer_NOS (logical)
      • Skin_Cancer_NOS (logical)
      • Soft_Tissue_Cancer_NOS (logical)
      • Stomach_Cancer_NOS (logical)
      • Testicular_Cancer_NOS (logical)
      • Thymic_Cancer_NOS (logical)
      • Thyroid_Cancer_NOS (logical)
      • Uterine_Cancer_NOS (logical)
      • Vulvar_Vaginal_Cancer_NOS (logical)
    • mutational_signatures
      • mutsignatures_cutoff (float) - discard any signature contributions with a weight less than the cutoff
    • cna
      • transcript_cna_overlap (float) - minimum percent overlap between copy number segment and transcripts (average) for tumor suppressor gene/proto-oncogene to be reported
    • allelic_support
      • If input VCF has correctly formatted depth/allelic fraction as INFO tags, users can add thresholds on depth/support that are applied prior to report generation
        • tumor_dp_min (integer) - minimum sequencing depth for variant in tumor sample
        • tumor_af_min (float) - minimum allelic fraction for variant in tumor sample
        • normal_dp_min (integer) - minimum sequencing depth for variant in normal sample
        • normal_af_max (float) - maximum allelic fraction for variant in normal sample
    • visual
      • report_theme (string) - visual theme of report (Bootstrap)
    • other
      • vcf_validation (logical) - keep/skip VCF validation by vcf-validator
  • New output file - JSON output of HTML report content
  • New INFO tags of PCGR-annotated VCF
    • CANCER_PREDISPOSITION
    • PFAM_DOMAIN
    • TCGA_FREQUENCY
    • TCGA_PANCANCER_COUNT
    • ICGC_PCAWG_OCCURRENCE
    • ICGC_PCAWG_AFFECTED_DONORS
    • CLINVAR_MEDGEN_CUI
  • New column entries in annotated SNV/InDel TSV file:
    • CANCER_PREDISPOSITION
    • ICGC_PCAWG_OCCURRENCE
    • TCGA_FREQUENCY
  • New column in CNA output
    • TRANSCRIPTS - aberration-overlapping transcripts (Ensembl transcript IDs)
    • MEAN_TRANSCRIPT_CNA_OVERLAP - Mean overlap (%) betweeen gene transcripts and aberration segment
Removed
  • Elements of databundle (now annotated directly through VEP):
    • dbsnp
    • gnomad/exac
    • 1000G project
  • INFO tags of PCGR-annotated VCF
    • DBSNPBUILDID
    • DBSNP_VALIDATION
    • DBSNP_SUBMISSIONS
    • DBSNP_MAPPINGSTATUS
    • GWAS_CATALOG_PMID
    • GWAS_CATALOG_TRAIT_URI
    • DOCM_DISEASE
  • Output files
    • TSV files with mutational signature results and biomarkers (i.e. sample_id.pcgr.snvs_indels.biomarkers.tsv and sample_id.pcgr.mutational_signatures.tsv)
      • Data can still be retrieved - now from the JSON dump
    • MAF file
      • The previous MAF output was generated in a custom fashion, a more accurate MAF output based on https://github.com/mskcc/vcf2maf will be incorporated in the next release
Changed
  • HTML report sections
    • Tier statistics and Variant statistics are now grouped into the section Tier and variant statistics
    • Tier 5 is now Noncoding mutations (i.e. not considered a tier per se)
    • Sliders for allelic fraction in the Global variant browser are now fixed from 0 to 1 (0.05 intervals)