Usage

Simple self-test

with Illumina test data:

nextflow run main.nf                                      \
        -profile staphylococcus_aureus,illumina,apptainer \
        -config nextflow.config                           \
        --csv assets/test_data/samplelist.csv

with ONT test data:

nextflow run main.nf                                      \
        -profile staphylococcus_aureus,nanopore,apptainer \
        -config nextflow.config                           \
        --csv assets/test_data/samplelist_nanopore.csv

Remember to edit the paths to the test file(s) in the samplelist.

Usage arguments

Argument type

Options

Required

-profile (species)

staphylococcus_aureus/escherichia_coli/mycobacterium_tuberculosis/klebsiella/streptococcus_pyogenes/streptococcus

True

-profile (platform)

illumina/nanopore/iontorrent

True

-profile (RLS)

development/diagnostic/validation

False

-config

nextflow.config

True

-resume

NA

False

–output

user-specified

False

RLS = Release life cycle (default: diagnostic)

Input file format

For short reads:

Example of a samplelist input file in CSV format.

id

platform

sequencing_run

read1

read2

sample01

illumina

seqrun0123

path_to_reads/sample01_forward.fastq.gz

path_to_reads/sample01_reverse.fastq.gz

For long reads (ONT):

Example of a samplelist input file in CSV format.

id

platform

sequencing_run

read1

sample01

nanopore

seqrun0123

path_to_reads/sample01.fastq.gz

As input for long reads we recommend fastq files that were obtained by basecalling using SUP model.

Downsampling reads

There are an option to use seqtk downsample the number of for a sample as a preprocessing step before all other analyses. This can be useful if a sample was sequenced too deeply, as extreme sequencing depth can causes issues with de-novo assemblies.

Activate downsampling by setting the parameter target_sample_size to the either the desired number of reads or the fraction of reads to include in the config.

Removing Human reads

There are an option to use hostile to filter human reads from further analyses. This can be useful if a sample has been contaminated, which could cause issues with de-novo assemblies.

Activate human read depletion by setting the parameter use_hostile to true in the config.

Adapter and quality trimming (Illumina)

There is an option to use Trimmomatic to trim adapters and low-quality bases from Illumina reads as a preprocessing step. This feature is turned off by default.

Activate Trimmomatic by setting use_trimmomatic to true in the config (Illumina platform only). Customise the trimming steps via trimmomatic_args (default: LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36); for adapter trimming, append an ILLUMINACLIP:<adapter.fa>:2:30:10 step to the args.

Output

  • postalignqc output: statistics are computed using only a core genome

  • A coverage uniformity is calculated by dividing interquartile range by median coverage, a lower value indicating more uniform coverage of a genome

  • coverage output: statistics are computed using a whole genome (and plasmids, if they are a part of the reference genome)

  • Polishing of genome assembly created from ONT data is done in two rounds with bacterial methylation model as default.

  • Variants reported by Freebayes are used for masking the genome before performing cgMLST analaysis (default: true for Illumina data, false for ONT data) and are computed by aligning reads to the assembly, not to the reference genome. When masking step is run, these variants are also reported in the output file analysis_result/*_result.json.

  • Gambitcore identifies the closest species and asseses completeness of assembly, detailed description of the output can be found here