The IRIDA platform is implementing an ontology framework enabling the integration of different data types required for outbreak investigation, surveillance and reporting. Ontology is a software tool that allows data to be both human and machine-readable as terms are standardized and reused between information types. IRIDA’s Genomic Epidemiology Application Ontology (GenEpiO) is another innovation enhancing the platform’s analytical power.
The Need for Data Standardization
Timeliness of foodborne outbreak analyses is key for reducing the number of preventable cases of disease. The ability to resolve outbreaks relies heavily on good contextual information regarding “person, place and time”, which is crucial for identifying sources of contamination and exposure. Contextual information is also required for human health risk assessments, source attribution, ecosystems modelling, and in the simplest terms, to make sense of the genome data.
The digitization of genomics allows for increased resolution of infectious sequence types and rapid transmission of data, however, significant computational challenges remain in terms of genomics result reporting and analysis. Raw genome sequences need to be processed and presented differently and in a timely and secure manner to end-users in the health care environment with vastly different roles (attending physicians, infection control, environmental health officers, medical health officers, public health epidemiologists, etc.) and affiliations. The ability to share secure and standardized data within and across organizations is critical to implement genomic epidemiology for public health microbiology.
Ontologies as a Framework for Data Integration
A solution for providing a framework for integrating clinical, epidemiological and laboratory (genomic) data types is through the use of ‘ontologies’. Ontologies, well-defined and standardized vocabulary interconnected by logical relationships, are constructed in such a way to facilitate fast and automated querying. Standardization of vocabulary allows for increased interoperability between systems and integration of previously isolated databases as well as resolving semantic ambiguity. Highlights of the benefits of ontologies for surveillance and detection activities include:
- Faster data integration and exchange based on standardized fields. The longitudinal nature of pathogen surveillance requires information to be propagated and compared between agencies, which can occur much more quickly and in a computer-amenable manner if contextual information is standardized.
- Mapping of institution-specific terms used in public health interfaces to standards allow for customized data entry while facilitating interoperability.
- Standardized quality control and result reporting trigger actionable events in same way, which will contribute to the accreditation and validation of clinically implemented genomics pipelines.
IRIDA’s Genomic Epidemiology Application Ontology (GenEpiO)
Our research efforts include the development of a Genomic Epidemiology Application Ontology (GenEpiO), based on public-health stakeholder interviews and the harmonization of important laboratory, clinical and epidemiological resources. The goal is to develop an ontology that supports an end-to-end genomic epidemiology pipeline, in order to fully propagate all of the necessary contextual information required to interpret genomics data, from the point-of-intake through sequencing to end use (eg. in an epidemiologic investigation).
Since diseases do not respect international borders, uptake of a common, standard vocabulary for describing outbreak and surveillance activities is crucial for inter-jurisdictional interpretation of results and data sharing.
GenEpiO contains key fields and terms to describe sample metadata, lab analytics, clinical information as well as exposures and epidemiological data. GenEpiO terms are mapped to community standards and existing ontologies to ensure the accuracy of meaning and facilitate interoperability between software systems. The GenEpiO ontology has already been implemented in the different IRIDA interfaces. The Metadata Manager allows standardized metadata entry for isolates and generates BioSample-compliant forms for genome submission to NCBI. GenEpiO has also been used to create different Line List visualization tools for epidemiological investigations.
The Genomic Epidemiology Consortium
Harmonization of the genomic epidemiology ontology can only be achieved by consensus and wide adoption, and international input and expertise is crucial to achieve these goals. In order to ensure that GenEpiO is sufficiently robust to serve all use cases, we are currently forming an inclusive International Genomic Epidemiology Ontology Consortium to build partnerships and solicit domain expertise. All interested individuals are welcome to participate.
To join, or find out more about our goals and activities, please contact email@example.com.
The preliminary draft OWL file can be found at: https://github.com/GenEpiO/genepio/wiki
Key domains under development include Antimicrobial Resistance, Epidemiology, and an international Food Ontology, all critical for tackling the global threats of antibiotic resistance and emerging pathogens. As good food descriptors for food matrices and food production environments are key for surveillance and foodborne outbreak investigations, we are forming a Food Ontology (FoodOn) Consortium in parallel with the GenEpiO Consortium.
Community contributions welcome.