Usage
Simple self-test
with Illumina test data:
nextflow run main.nf \
-profile staphylococcus_aureus,illumina,apptainer \
-config nextflow.config \
--csv assets/test_data/samplelist.csv
with ONT test data:
nextflow run main.nf \
-profile staphylococcus_aureus,nanopore,apptainer \
-config nextflow.config \
--csv assets/test_data/samplelist_nanopore.csv
Remember to edit the paths to the test file(s) in the samplelist.
Usage arguments
Argument type |
Options |
Required |
|---|---|---|
-profile (species) |
staphylococcus_aureus/escherichia_coli/mycobacterium_tuberculosis/klebsiella/streptococcus_pyogenes/streptococcus |
True |
-profile (platform) |
illumina/nanopore/iontorrent |
True |
-profile (RLS) |
development/diagnostic/validation |
False |
-config |
nextflow.config |
True |
-resume |
NA |
False |
–output |
user-specified |
False |
RLS = Release life cycle (default: diagnostic)
Input file format
For short reads:
id |
platform |
sequencing_run |
read1 |
read2 |
|---|---|---|---|---|
sample01 |
illumina |
seqrun0123 |
path_to_reads/sample01_forward.fastq.gz |
path_to_reads/sample01_reverse.fastq.gz |
For long reads (ONT):
id |
platform |
sequencing_run |
read1 |
|---|---|---|---|
sample01 |
nanopore |
seqrun0123 |
path_to_reads/sample01.fastq.gz |
As input for long reads we recommend fastq files that were obtained by basecalling using SUP model.
Downsampling reads
There are an option to use seqtk downsample the number of for a sample as a preprocessing step before all other analyses. This can be useful if a sample was sequenced too deeply, as extreme sequencing depth can causes issues with de-novo assemblies.
Activate downsampling by setting the parameter target_sample_size to the either the desired number of reads or the fraction of reads to include in the config.
Removing Human reads
There are an option to use hostile to filter human reads from further analyses. This can be useful if a sample has been contaminated, which could cause issues with de-novo assemblies.
Activate human read depletion by setting the parameter use_hostile to true in the config.
Adapter and quality trimming (Illumina)
There is an option to use Trimmomatic to trim adapters and low-quality bases from Illumina reads as a preprocessing step. This feature is turned off by default.
Activate Trimmomatic by setting use_trimmomatic to true in the config (Illumina platform only). Customise the trimming steps via trimmomatic_args (default: LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36); for adapter trimming, append an ILLUMINACLIP:<adapter.fa>:2:30:10 step to the args.
Output
postalignqcoutput: statistics are computed using only a core genomeA coverage uniformity is calculated by dividing interquartile range by median coverage, a lower value indicating more uniform coverage of a genome
coverageoutput: statistics are computed using a whole genome (and plasmids, if they are a part of the reference genome)Polishing of genome assembly created from ONT data is done in two rounds with bacterial methylation model as default.
Variants reported by Freebayes are used for masking the genome before performing cgMLST analaysis (default: true for Illumina data, false for ONT data) and are computed by aligning reads to the assembly, not to the reference genome. When masking step is run, these variants are also reported in the output file
analysis_result/*_result.json.Gambitcore identifies the closest species and asseses completeness of assembly, detailed description of the output can be found here