Usage

Simple self-test

with Illumina test data:

nextflow run main.nf                                      \
        -profile staphylococcus_aureus,illumina,apptainer \
        -config nextflow.config                           \
        --csv assets/test_data/samplelist.csv

with ONT test data:

nextflow run main.nf                                      \
        -profile staphylococcus_aureus,nanopore,apptainer \
        -config nextflow.config                           \
        --csv assets/test_data/samplelist_nanopore.csv

Remember to edit the paths to the test file(s) in the samplelist.

Usage arguments

Argument type	Options	Required
-profile (species)	staphylococcus_aureus/escherichia_coli/mycobacterium_tuberculosis/klebsiella/streptococcus_pyogenes/streptococcus	True
-profile (platform)	illumina/nanopore/iontorrent	True
-profile (RLS)	development/diagnostic/validation	False
-config	nextflow.config	True
-resume	NA	False
–output	user-specified	False

RLS = Release life cycle (default: diagnostic)

Input file format

For short reads:

Example of a *samplelist* input file in CSV format.
id	platform	sequencing_run	read1	read2
sample01	illumina	seqrun0123	path_to_reads/sample01_forward.fastq.gz	path_to_reads/sample01_reverse.fastq.gz

For long reads (ONT):

Example of a *samplelist* input file in CSV format.
id	platform	sequencing_run	read1
sample01	nanopore	seqrun0123	path_to_reads/sample01.fastq.gz

As input for long reads we recommend fastq files that were obtained by basecalling using SUP model.

Downsampling reads

There are an option to use seqtk downsample the number of for a sample as a preprocessing step before all other analyses. This can be useful if a sample was sequenced too deeply, as extreme sequencing depth can causes issues with de-novo assemblies.

Activate downsampling by setting the parameter target_sample_size to the either the desired number of reads or the fraction of reads to include in the config.

Removing Human reads

There are an option to use hostile to filter human reads from further analyses. This can be useful if a sample has been contaminated, which could cause issues with de-novo assemblies.

Activate human read depletion by setting the parameter use_hostile to true in the config.

Adapter and quality trimming (Illumina)

There is an option to use Trimmomatic to trim adapters and low-quality bases from Illumina reads as a preprocessing step. This feature is turned off by default.

Activate Trimmomatic by setting use_trimmomatic to true in the config (Illumina platform only). Customise the trimming steps via trimmomatic_args (default: LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36); for adapter trimming, append an ILLUMINACLIP:<adapter.fa>:2:30:10 step to the args.

Output

postalignqc output: statistics are computed using only a core genome
A coverage uniformity is calculated by dividing interquartile range by median coverage, a lower value indicating more uniform coverage of a genome
coverage output: statistics are computed using a whole genome (and plasmids, if they are a part of the reference genome)
Polishing of genome assembly created from ONT data is done in two rounds with bacterial methylation model as default.
Variants reported by Freebayes are used for masking the genome before performing cgMLST analaysis (default: true for Illumina data, false for ONT data) and are computed by aligning reads to the assembly, not to the reference genome. When masking step is run, these variants are also reported in the output file analysis_result/*_result.json.
Gambitcore identifies the closest species and asseses completeness of assembly, detailed description of the output can be found here