IRIDA provides a fully featured system for the storage, management, and sharing of sequencing data. Sequence data can be imported directly form Illumina MiSeq Sequencers into IRIDA’s data storage and management system. The sequence data is organized into projects and access to the data can can be shared with project collaborators. Data can also be shared with other IRIDA instances across the internet. Data can be analyzed directly with IRIDA, exported to file or to the Galaxy workflow system. Read on for more detail on IRIDA’s sequence data management.
IRIDA’s data model is inspired by the INSDC.
IRIDA’s data model is rooted by a project. A project has some metadata (an organism, description, links), a collection of users or groups who have permission to view or edit the data in the project, and a collection of samples.
A sample has some metadata (an organism, description, and collection information like date and location) and a collection of sequencing data that was generated for the isolate.
IRIDA also tracks some information about the sequencing instrument that was used to generate the data.
IRIDA’s file management application is made up of several different parts:
- An easy-to-use web interface for humans,
- A REST API for bots and applications,
- A centralized file storage area, and
- An internal compute cluster for executing analyses.
Internally, applications using data managed by IRIDA are encouraged to make use of the central file storage area. For example, internal and external interactions with the Galaxy workflow execution engine use filesystem links to the central file storage area instead of copying files so that overall disk usage is reduced when users work with their NGS data.
The web interface enables humans to manage and organize their sequencing data. Users can easily organize their data by grouping samples into projects, then move, edit, and copy samples between projects. IRIDA also includes useful search functions to filter data sets by name, organism, or by uploading a list of names generated by an external data source.
Getting Your Data Into IRIDA
The web interface also provides methods for getting data in and out of IRIDA. IRIDA has methods for uploading small data sets with the web interface (larger data sets and sequencing centres should use our desktop uploader tool). FASTQ files that are uploaded to IRIDA are decompressed and processed by FastQC to provide some quality control metrics.
Getting Your Data Out of IRIDA
Sharing Data Within IRIDA
IRIDA also has file sharing features so that users can share their data with other users and groups managed by IRIDA. Users can permit other users in IRIDA to access or modify their data, and may also create and manage their own user groups for assigning bulk permissions.
Sharing Data Outside of IRIDA
Users can get data out of IRIDA in several ways:
- Downloading a large, compressed directory of sequencing data in bulk,
- Sending their data to an external instance of Galaxy,
- Exporting their files to the command-line as a link structure, or
- Sending their files in bulk to NCBI’s Sequence Read Archive.
Users may Download their data sets in bulk from IRIDA using the export feature. The export feature allows users to download their data in bulk, creating a large zip file that contains all of the samples that they have selected. Users are encouraged to use alternative methods for getting their data out of IRIDA because of the large file sizes.
The Send to Galaxy feature allows a user to send selected sample data to an external instance of Galaxy for analyses that are not built into IRIDA. The tool for importing data to Galaxy from IRIDA uses the Galaxy Data Library feature to link to files on the central file storage area instead of physically copying data, to reduce overall disk usage.
The Command-line Linker feature allows a power-user to recreate the project/sample structure as folders on their file system, allowing them to execute analyses that are not built into IRIDA or Galaxy.
Finally, the Upload to NCBI SRA export feature facilitates transferring data stored in IRIDA to NCBI’s SRA. Users do not need to download the files, then upload them to NCBI’s FTP site, IRIDA seamlessly handles transferring the data to NCBI.