Along with managing your sequencing data, IRIDA includes a number of workflows for analysing your sequencing data.

IRIDA provides support for the execution of high-throughput analysis pipelines through a standard web browser.  Samples containing previously-uploaded genomic data are selected for analysis through the provided pipelines.  IRIDA guides users through selection of any sequence files and reference genomes and helps customize parameters before the data is sent for analysis to a pre-configured high performance computing environment.  Execution of each pipeline is handled by a local Galaxy (https://galaxyproject.org/) instance behind the scenes.  Progress of each pipeline can be monitored through IRIDA and, on completion, results can be viewed in the browser or downloaded for further processing.  IRIDA records all files used as input as well as all parameters used to generate a particular result.

For more information on developing IRIDA workflows see the Development page.

SNVPhyl

The SNVPhyl (Single Nucleotide Variant PHYLogenomics) pipeline is a pipeline for identifying Single Nucleotide Variants (SNV) within a collection of microbial genomes and constructing a phylogenetic tree.

The output for the pipeline consists of a whole genome phylogenetic tree constructed from the detected SNVs, as well as a list of all detected SNVs and other information.

 

snvphyl-out

 

Operation:

SNVPhyl identifies variants and generates a phylogenetic tree by mapping the input sequence reads to a reference genome followed by filtering out any invalid variant calls. The stages are as follows:

  1. Preparing input files including:
    1. A set of sequence reads.
    2. A reference genome.
    3. An optional file of regions to mask on the reference genome.
  2. Identification of repeat regions on the reference genome using MUMMer.
  3. Reference mapping and variant calling using SMALT, FreeBayes and SAMtools/BCFtools.
  4. Merging and filtering variant calls to produce a set of high quality SNVs.
  5. Generating an alignment of SNVs.
  6. Building a maximum likelihood tree with PhyML and generating other output files.

More information on SNVPhyl can be found in the SNVPhyl documentation.

Assembly and Annotation

The assembly and annotation pipeline built into IRIDA proceeds through the following steps:

  1. Paired-end reads are merged using FLASh.
  2. The merged paired-end reads as well as the unmerged reads are passed to SPAdes to perform a de novo assembly.
  3. The contigs returned by SPAdes are filtered to remove small and low coverage contigs.
  4. The filtered contigs are passed to Prokka for genome annotation.
  5. A set of summary statistics are generated for the assembled genome.