Data
PCGR requires the following data to run successfully:
- Reference bundle - containing data from multiple knowledge resources, including information on molecular biomarkers, targeted cancer therapies, variant frequencies etc. Key datasets include CIViC, CGI, Open Targets Platform, TCGA, and ClinVar.
- Ensembl VEP data cache - needed for variant annotation with VEP (Variant Effect Predictor)
- User-supplied sample-specific inputs - e.g. somatic variant calls in VCF format
PCGR supports both the GRCh37 and GRCh38 human genome assemblies. All the data above need to match the chosen assembly.
1. Reference Bundle
Reference bundles are generated semi-automatically (by the PCGR author) and are versioned based on their release date. Keep in mind that the bundles support only certain Ensembl VEP versions. The latest (v20240927) genome-specific bundles can be downloaded directly from below (size: ~4G):
Assembly | Download Link |
---|---|
GRCh38 | https://insilico.hpc.uio.no/pcgr/pcgr_ref_data.20240927.grch38.tgz |
GRCh37 | https://insilico.hpc.uio.no/pcgr/pcgr_ref_data.20240927.grch37.tgz |
Tip 1: The
data/grch3x/.PCGR_BUNDLE_VERSION
file within the downloaded
bundle indicates the bundle version for reporting purposes.
Tip 2: The
data/grch3x/data_overview.grch3x.html
file provides a
report with an overview and statistics of the key resources included in
the reference bundle.
2. VEP Cache
VEP
requires a data cache which is available from the Ensembl FTP
site (search there for files starting with
homo_sapiens_vep_
). The latest Ensembl VEP version we
support is v112.
Tip: PCGR needs to be pointed to the parent
directory containing the downloaded
homo_sapiens/xyz_GRCh3x/
cache, which is usually called
.vep
if you’ve followed the VEP cache download
instructions.
Software
The PCGR workflow can be installed with any of the following:
- A. Conda,
- B. Docker, or
- C. Singularity/Apptainer.
A. Conda
There is Conda support for both Linux and macOS machines. The
following process can take anywhere from 10 up to 40 minutes when
installing from scratch, mostly depending on the user’s and server’s
internet connection. Most of the time is spent on downloading the
{BSgenome.Hsapiens.UCSC.hg19}
and
{BSgenome.Hsapiens.UCSC.hg38}
R packages (which happens at
the very end of the conda environment creation).
PCGR_VERSION="2.1.2"
# set up variables
PCGR_REPO="https://raw.githubusercontent.com/sigven/pcgr/v${PCGR_VERSION}/conda/env/lock/"
PLATFORM="linux"
# create conda envs in local directory
mkdir pcgr_conda
conda create --prefix ./pcgr_conda/pcgr --file ${PCGR_REPO}/pcgr-${PLATFORM}-64.lock
conda create --prefix ./pcgr_conda/pcgrr --file ${PCGR_REPO}/pcgrr-${PLATFORM}-64.lock
# you need to specify the directory of the conda env when using --prefix
conda activate ./pcgr_conda/pcgr
# test that it works
pcgr --version
pcgr --help
For macOS M1 machines, you need to include
CONDA_SUBDIR=osx-64
before the conda create
command - see https://github.com/conda-forge/miniforge/issues/165#issuecomment-860233092:
PCGR_VERSION="2.1.2"
# set up variables
PCGR_REPO="https://raw.githubusercontent.com/sigven/pcgr/v${PCGR_VERSION}/conda/env/lock/"
PLATFORM="osx"
# create conda envs in local directory
mkdir pcgr_conda
CONDA_SUBDIR=osx-64 conda create --prefix ./pcgr_conda/pcgr --file ${PCGR_REPO}/pcgr-${PLATFORM}-64.lock
CONDA_SUBDIR=osx-64 conda create --prefix ./pcgr_conda/pcgrr --file ${PCGR_REPO}/pcgrr-${PLATFORM}-64.lock
# you need to specify the directory of the conda env when using --prefix
conda activate ./pcgr_conda/pcgr
# test that it works
pcgr --version
pcgr --help
B. Docker
The PCGR Docker image is available on Docker Hub. Pull the latest v2.1.2 image with:
docker pull sigven/pcgr:2.1.2
Example Run
PCGR_VERSION="2.1.2"
docker container run -it --rm \
-v /Users/you/dir1/.vep:/mnt/.vep
-v /Users/you/dir1/bundle:/mnt/bundle \
-v /Users/you/dir1/pcgr_inputs:/mnt/pcgr_inputs \
-v /Users/you/dir1/pcgr_outputs:/mnt/pcgr_outputs \
sigven/pcgr:${PCGR_VERSION} \
pcgr \
--input_vcf "/mnt/pcgr_inputs/T001-BRCA.grch38.vcf.gz" \
--vep_dir "/mnt/.vep" \
--refdata_dir "/mnt/bundle" \
--output_dir "/mnt/pcgr_outputs" \
--genome_assembly "grch38" \
--sample_id "SAMPLE_B" \
--tumor_dp_tag "TDP" \
--tumor_af_tag "TAF" \
--assay "WGS" \
--vcf2maf
NOTE: If you need to run the Docker-based version of
PCGR as a non-root user, you may need to explicitly add options for
quarto to work properly, i.e.
--env "XDG_CACHE_HOME=/tmp/quarto_cache_home"
(same as for
Singularity/Apptainer below, see also issue #246).
C. Singularity/Apptainer
The PCGR Singularity/Apptainer image is available on GitHub Container Registry. Pull the latest v2.1.2 image with:
apptainer pull oras://ghcr.io/sigven/pcgr:2.1.2.singularity
This will download a Singularity Image File (SIF) called pcgr_2.1.2.singularity.sif that can be run with Singularity or Apptainer.
Example Run
PCGR_VERSION="2.1.2"
apptainer exec \
--writable-tmpfs \
--no-home \
--env "XDG_CACHE_HOME=/tmp/quarto_cache_home" \
-B /Users/you/dir1/.vep:/mnt/.vep \
-B /Users/you/dir1/bundle:/mnt/bundle \
-B /Users/you/dir1/pcgr_inputs:/mnt/pcgr_inputs \
-B /Users/you/dir1/pcgr_outputs:/mnt/pcgr_outputs \
pcgr_${PCGR_VERSION}.singularity.sif \
pcgr \
--input_vcf "/mnt/pcgr_inputs/T001-BRCA.grch38.vcf.gz" \
--vep_dir "/mnt/.vep" \
--refdata_dir "/mnt/bundle" \
--output_dir "/mnt/pcgr_outputs" \
--genome_assembly "grch38" \
--sample_id "SAMPLE_B" \
--assay "WGS" \
--tumor_dp_tag "TDP" \
--tumor_af_tag "TAF" \
--vcf2maf
Note: For any Apptainer/Singularity issues not directly related to PCGR, we would recommend reaching out to the Apptainer community (e.g. https://github.com/apptainer/apptainer) since we have limited experience with Apptainer/Singularity.