Installation

Requirements

  • Apptainer

  • Nextflow (curl -s https://get.nextflow.io | bash)

Recommended

  • Conda

Development deployment (self-contained)

Copy code locally

git clone --branch master              \\ 
    https://github.com/genomic-medicine-sweden/jasen.git && \\
cd jasen

Installation requirements

NOTE: We assume that your OS has the following command-line tools installed in order for installation of JASEN:

unzip
gcc
zlib

Create Apptainer images.

The containers will be attempted to be built and downloaded as part of the main Makefile (that is, when running make install in the main repo folder).

cd containers && make

Download references and databases using Apptainer.

First, make sure your current working directory is in the main jasen folder (so if you cd:ed into the container folder before, you need to cd back to the main folder with cd ..). Then run the install make rule:

NOTE: Kraken and MLST databases need to be downloaded manually! Installation can be done independently for different species. Please see instructions below!

make install

Finally, run checks:

make check

Any errors produced during this step will hinder pipeline execution in unexpected ways.

Species-specific installation

The following species are able be installed independently as to save time and disk usage:

  • saureus

  • ecoli

  • klebsiella

  • mtuberculosis

This is done by executing the following:

NOTE: spyogenes & streptococcus don’t have any specific installation requirements, so make update_databases should suffice.

ORG="saureus"
make update_databases && make ${ORG}_all

Configuration and test data

Config

Source: nextflow.config

  • Edit the root parameter

  • Edit the workDir and outdir parameters

  • Edit the use_kraken parameter (default: false) and kraken_db to specify path to the database

  • Edit the use_hostile parameter in nextflow.config in order to filter out human reads (default: false)

  • Edit the use_skesa parameter (default: true) if you would like to use SPAdes instead of Skesa for assembly of short reads

  • Edit the target_sample_size parameter in order to downsample reads

  • Add runOptions to apptainer/singularity profile in order to mount directories to your run, e.g. output folder, workdir (Example: apptainer.runOptions = "--bind ${params.outdir} --bind ${params.workDir}")

When analysing Nanopore data:

  • Edit the ext.seqmethod in conf/modules.config for Flye in case you are using older ONT data (default: –nano-hq, suitable for ONT data generated with R10 chemistry)

  • params.clair3_model in nextflow.config is set to r1041_e82_400bps_sup_v430_bacteria_finetuned, but can be changed to any other available model

  • Medaka is recognising basecalling model automatically and using bacterial model for polishing of the assembly, but this can be changed in conf/modules.config (edit ext.args for a process named medaka)

Test data

Source: assets/test_data/samplelist*.csv

  • For short reads produced with Illumina or IonTorrent technology, edit the read1 and read2 columns in assets/test_data/samplelist.csv

  • For long reads produced with ONT technology, edit the read1 column in assets/test_data/samplelist_nanopore.csv

Setting up temp directories

Source: ~/.bashrc

  • Add the export line to ~/.bashrc

  • Change SINGULARITY_TMPDIR to APPTAINER_TMPDIR if you are using apptainer

export SINGULARITY_TMPDIR="/tmp" #or equivalent filepath to tmp dir

Fetching/updating databases

NOTE: Both kraken and mlst require their databases to be downloaded MANUALLY

Kraken

Choose between Kraken DB (64GB [Highly recommended]) or MiniKraken DB (8GB). Alternatively you can customize your own.

Download Kraken database

wget -O /path/to/kraken_db/krakenstd.tar.gz https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20230314.tar.gz
tar -xf /path/to/kraken_db/krakenstd.tar.gz

Download MiniKraken database

wget -O /path/to/kraken_db/krakenmini.tar.gz https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20230314.tar.gz
tar -xf /path/to/kraken_db/krakenmini.tar.gz

Batch mode (kraken_batch)

Set use_kraken_batch = true in nextflow.config to collect all samples into a single job and copy the Kraken database into /dev/shm (a RAM-backed filesystem) before classification. This loads the database once per pipeline run and eliminates disk I/O during classification, significantly reducing runtime for large sample batches.

Requirements: Each compute node running Kraken must have at least 80–100 GB of available /dev/shm. Check available space with:

df -h /dev/shm

If your nodes do not meet this requirement, leave use_kraken_batch = false (default) to run kraken2 per sample.

cgMLST database (BIGSdb Pasteur setup)

NOTE: The Klebsiella cgMLST schema is hosted on BIGSdb Pasteur and requires API credentials to download. Here are the steps:

  1. Request an API key by following the instructions at https://bigsdb.pasteur.fr/requesting-api-key/.

  2. Copy the client credentials template:

cp assets/.bigsdb_tokens/client_credentials.template assets/.bigsdb_tokens/client_credentials
  1. Edit client_id and client_secret in the assets/.bigsdb_tokens/client_credentials file.

[Pasteur]
client_id = insert_pasteur_client_id
client_secret = client_id = insert_pasteur_client_secret
  1. To download the raw cgMLST alleles from BIGSdb Pasteur, run: NOTE: This target must be run manually and is not part of make install. It requires OAuth credentials to be configured as described above.

make klebsiella_download_cgmlst_schema
  1. After downloading, re-reference the alleles by running:

make klebsiella_prep_cgmlst_schema

MLST databases (PubMLST & BLAST)

NOTE: PubMLST DB requires users to have an account at Bacterial Isolate Genome Sequence Database (BIGSdb) in order to download the latest reported alleles. Here are the steps:

  1. Register to all databases by clicking the Database registrations, check all, and register.

  2. Create an API key under the API keys dropdown.

  3. Add your credentials to your ~/.bashrc:

export PUBMLST_CLIENT_ID="<pubmlst_client_id>"
export PUBMLST_CLIENT_SECRET="<pubmlst_client_secret>"
export PASTEUR_CLIENT_ID="<pasteur_client_id>" # From BIGSdb Pasteur setup
export PASTEUR_CLIENT_SECRET="<pasteur_client_secret>" # From BIGSdb Pasteur setup

Download/update MLST database per species

Run the token setup step first, then the database build step. Both steps require the PUBMLST_CLIENT_ID and PUBMLST_CLIENT_SECRET (PubMLST schemas) or PASTEUR_CLIENT_ID and PASTEUR_CLIENT_SECRET (Pasteur schemas) environment variables.

S. aureus

make setup_saureus_mlstdb_token
make update_saureus_mlstdb

S. pyogenes

make setup_spyogenes_mlstdb_token
make update_spyogenes_mlstdb

E. coli achtman

make setup_ecoli_achtman_mlstdb_token
make update_ecoli_achtman_mlstdb

E. coli pasteur (needs BIGSdb Pasteur setup)

make setup_ecoli_pasteur_mlstdb_token
make update_ecoli_pasteur_mlstdb

Klebsiella (needs BIGSdb Pasteur setup)

make setup_klebsiella_mlstdb_token
make update_klebsiella_mlstdb