Quick Installation
Conda
- If you know what conda is, you only need to run the following commands in order to install the PCGR software requirements:
PCGR_VERSION="1.4.1"
PCGR_REPO="https://raw.githubusercontent.com/sigven/pcgr/v${PCGR_VERSION}/conda/env/lock/"
PLATFORM="linux" # or "osx"
# mamba is a much faster alternative to conda
conda install mamba -c conda-forge
mamba create --file ${PCGR_REPO}/pcgr-${PLATFORM}-64.lock --prefix ./pcgr
mamba create --file ${PCGR_REPO}/pcgrr-${PLATFORM}-64.lock --prefix ./pcgrr
# you need to specify the directory of the conda env when using --prefix
conda activate ./pcgr
# test that it works
pcgr --version
Data
- For downloading the data bundle, see STEP 1 further below.
Docker
- If you know what Docker is, instead of using the above conda method you can jump straight to the PCGR Docker setup section.
Detailed Installation
PCGR requires a data bundle that contains the reference data, sample inputs (e.g. somatic variants in a VCF), and an output directory to output the results to.
Here’s an example scenario that will be used in the following sections:
- data bundle downloaded in
/Users/you/dir1/data
; - sample inputs at
/Users/you/dir2/pcgr_inputs
; - output goes to
/Users/you/dir3/pcgr_outputs
(make sure this directory exists); - your PCGR codebase is installed in
/Users/you/dir4/PCGR
;
STEP 1: Download data bundle
Download and unpack the human assembly-specific data bundle:
grch37 data bundle - 20220203 (approx 20Gb)
grch38 data bundle - 20220203 (approx 21Gb)
Example:
GENOME="grch38" # or "grch37"
BUNDLE_VERSION="20220203"
BUNDLE="pcgr.databundle.${GENOME}.${BUNDLE_VERSION}.tgz"
wget http://insilico.hpc.uio.no/pcgr/${BUNDLE}
gzip -dc ${BUNDLE} | tar xvf -
STEP 2: Download PCGR GitHub repository
Download and unpack the latest software release from https://github.com/sigven/pcgr/releases.
Alternatively if you have git
installed, you can do:
PCGR_VERSION="1.4.1"
OUTPUT_DIRECTORY="PCGR"
git clone \
-b "v${PCGR_VERSION}" \
--depth 1 \
\
https://github.com/sigven/pcgr.git "${OUTPUT_DIRECTORY}"
Note that the --depth 1
option ensures you only clone
the specific version, and not the entire git history of the repo.
STEP 3: Set up Conda or Docker
Step 3 depends on if you want to use Conda or Docker:
- For Conda, continue reading the PCGR Conda setup.
- For Docker, skip to the PCGR Docker setup.
Option 1: Conda
a) Miniconda and Mamba
- Download and install the Miniconda installer from https://docs.conda.io/en/latest/miniconda.html:
- Make sure to download the Linux or MacOSX script according to which platform you’re currently on.
- Run
bash miniconda.sh
and follow the prompts (it should be okay to accept the defaults, unless you want to choose a different installation location than the default~/miniconda3
). - Exit your current terminal session and open a new one. You should
now notice something like a
(base)
string as a prefix in your terminal prompt. This means that you’re in thebase
conda environment, and you’re ready to start installing the conda environments for PCGR.
- Install Mamba in
this
base
environment, which is a very fast conda package installer.
PLATFORM="MacOSX" # or "Linux"
MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-${PLATFORM}-x86_64.sh"
wget ${MINICONDA_URL} -O miniconda.sh && chmod +x miniconda.sh
bash miniconda.sh
# exit terminal and open new one - you should now see:
(base) $
(base) $ conda install -c conda-forge mamba
(base) $ mamba --version
mamba 0.19.1
conda 4.11.0
b) Create PCGR conda environments
The conda/env/lock
directory in the PCGR codebase
contains two .lock
files which can be used to create the
required conda environments for the Python component (pcgr
)
and the R components (pcgrr
(and cpsr
)). We
install the conda dependencies for these two environments in the local
conda/env
directory in the following example:
cd /Users/you/dir4/PCGR
PLATFORM="osx-64" # or "linux-64"
PCGR_CONDA_ENV_DIR="./conda/env"
mamba create --prefix ${PCGR_CONDA_ENV_DIR}/pcgr --file ${PCGR_CONDA_ENV_DIR}/lock/pcgr-${PLATFORM}.lock
mamba create --prefix ${PCGR_CONDA_ENV_DIR}/pcgrr --file ${PCGR_CONDA_ENV_DIR}/lock/pcgrr-${PLATFORM}.lock
## Alternatively, for installing in your central conda directory, use the following:
# mamba create --name pcgr --file ${PCGR_CONDA_ENV_DIR}/lock/pcgr-${PLATFORM}.lock
# mamba create --name pcgrr --file ${PCGR_CONDA_ENV_DIR}/lock/pcgrr-${PLATFORM}.lock
## For MacOS M1, you need to have 'CONDA_SUBDIR=osx-64' before the mamba command, i.e.:
# CONDA_SUBDIR=osx-64 mamba create --prefix [...] --file [...]
## See https://github.com/conda-forge/miniforge/issues/165#issuecomment-860233092
The above process takes 10-15min when installing from scratch. In the
end, you can confirm your conda environments have been installed
correctly (notice how the paths are different to the base
env installation after using the --prefix
option
above):
$ (base) conda env list
# conda environments:
#
base * /Users/you/miniconda3
pcgr /Users/you/dir4/PCGR/conda/env/pcgr
pcgrr /Users/you/dir4/PCGR/conda/env/pcgrr
c) Activate pcgr conda environment
You need to activate the PCGR/conda/env/pcgr
conda
environment, and test that it works correctly with
e.g. pcgr --version
:
$ cd /Users/you/dir4/PCGR
(base) $ conda activate ./conda/env/pcgr
# note how the full path to the locally installed conda environment is now displayed
(/Users/you/dir4/PCGR) $ which pcgr
/Users/you/dir4/PCGR/conda/env/pcgr/bin/pcgr
(/Users/you/dir4/PCGR) $ pcgr --version
pcgr X.X.X
(/Users/you/dir4/PCGR) $ which pcgrr.R
/Users/you/dir4/PCGR/conda/env/pcgr/bin/pcgrr.R
You should now be all set up to run PCGR! Continue on to an example run.
Option 2: Docker
a) Install Docker
For installing Docker, follow the instructions at https://docs.docker.com/engine/install/ for your Linux or MacOSX machine. NOTE: We have not been able to perform enough testing on the Windows platform, and we have received feedback that particular versions of Docker/Windows do not work with PCGR (an example being mounting of data volumes).
- Test that Docker is running, e.g. by typing
docker ps
ordocker images
in the terminal window. - Adjust the computing resources dedicated to the Docker, i.e.: Memory of minimum 5GB, CPUs minimum 4 (see e.g. how to do that on MacOSX).
b) Download PCGR Docker Image
- Pull the PCGR
Docker image from DockerHub (approx 5.7Gb) with:
docker pull sigven/pcgr:vX.X.X
c) Run PCGR Docker Container
If you are familiar with working with Docker volumes (https://docs.docker.com/storage/volumes/) you can run
PCGR using Docker instead of conda using the
-v <host>:<container>
Docker option. You’ll
need to map your PCGR inputs to Docker container paths.
For example, say you have the input VCF sampleX.vcf.gz
stored in the directory /Users/you/project1
. You would need
to supply Docker with a --volume
(or -v
)
option mapping the directory of that VCF with a directory inside the
Docker container, e.g. /home/input_vcf_dir
. That would
become: -v /Users/you/project1:/home/input_vcf_dir
(note
the :
separating your directory from the container’s
directory).
Then your command would look something like this:
docker container run -it --rm \
-v /Users/you/dir1/data:/root/pcgr_data \
-v /Users/you/dir2/pcgr_inputs:/root/pcgr_inputs \
-v /Users/you/dir3/pcgr_outputs:/root/pcgr_outputs \
\
sigven/pcgr:1.4.1 \
pcgr --input_vcf "/root/pcgr_inputs/tumor_sample.BRCA.vcf.gz" \
--pcgr_dir "/root/pcgr_data" \
--output_dir "/root/pcgr_outputs" \
--genome_assembly "grch38" \
--sample_id "SampleB" \
--assay "WGS" \
--vcf2maf
- Note the path mappings. You’re using the container paths in the command, not the host (your machine’s) paths.