Interoperability driven integration of biomedical data sources - - PowerPoint PPT Presentation

interoperability driven integration of biomedical data
SMART_READER_LITE
LIVE PREVIEW

Interoperability driven integration of biomedical data sources - - PowerPoint PPT Presentation

Interoperability driven integration of biomedical data sources Douglas TEODORO a,1 , Rmy CHOQUET c , Daniel SCHOBER d , Giovanni MELS e , Emilie PASCHE a , Patrick RUCH b , and Christian LOVIS a a SIMED, University Hospitals of Geneva and b HEG,


slide-1
SLIDE 1

1

Interoperability driven integration of biomedical data sources

Douglas TEODOROa,1, Rémy CHOQUETc, Daniel SCHOBERd, Giovanni MELSe, Emilie PASCHEa, Patrick RUCHb, and Christian LOVISa

aSIMED, University Hospitals of Geneva and bHEG, University of Applied Sciences,

Geneva, Switzerland; cINSERM, Université Pierre et Marie Curie, Paris, France; dFreiburg University Medical Center, Germany; eAGFA Healthcare, Ghent, Belgium Oslo, 30 August 2011

slide-2
SLIDE 2

2

The DebugIT project

  • Funded by the European Community's Seventh

Framework Program under grant agreement n° FP7– 217139 (7M€)

  • Project period: from Jan 1st, 2008 to December 31st,

2011 (?)

  • 14 Partners

Disclaimer: this presentation reflects solely the views of the DebugIT team. The European Commission, Directorate General Information Society and Media, Brussels is not liable for any use that may be made

  • f the information contained therein
  • Douglas Teodoro - MIE 2011
slide-3
SLIDE 3

3

Aim and objectives

  • Design of a data integration architecture for helping with

researching and monitoring of antimicrobial resistance using existing operational microbiology databases

  • Integrate heterogeneous operational clinical information

systems

– Design methods to interoperate with various storage systems, – Implement a data source mediator

  • Provide common semantics to the data

– Formalize data source models and data types

  • Provide ubiquitous access to the data

– Expose laboratory data from the data sources on the Internet

  • Douglas Teodoro - MIE 2011
slide-4
SLIDE 4

4

The virtual Clinical Data Repository

  • A data integration platform for existing clinical data

– Primarily focused on antimicrobial data but extensible to other domains

  • Based on Semantic Web technologies
  • Follows the hybrid ontology-driven integration approach

– Multiple semantically flat data definition ontologies are mapped to a common semantically defined domain ontology

  • Provide three levels of interoperability in the data

integration process

– Technical – Syntactic – Semantic

  • Douglas Teodoro - MIE 2011
slide-5
SLIDE 5

5

  • An intermediate storage layer

was designed to provide a common storage system

  • Based on RDF store

– RDF model – SPARQL protocol

  • ETL jobs provide interface to

the different storage systems

  • Data sources are connected via

HTTPS/SPARQL protocol

Methods: Technical interoperability

HTTPS

RDF store RDF store RDF store RDF store RDF store RDF store

ETL

Internet

<XML> files Text files RDMBS RDMBS

HTTPS

Intranet DMZ Internet Extract-Transform-Load Local security

  • Douglas Teodoro - MIE 2011
slide-6
SLIDE 6

6

Methods: Syntactic interoperability

  • Data cataloging

– Bottom-up process

  • Local data types are aligned

using biomedical terminologies

– WHO-ATC, SNOMED-CT, NEWT

  • Multi-stage text-based

classification are used for automatic normalization

– Ruch, Bioinformatics 2006; Daumke, GDMS 2010

  • A domain ontology was

designed to represent the field (DebugIT Core Ontology)

  • Terminologies are mapped to

DCO using SKOS ontology

  • Douglas Teodoro - MIE 2011

mapping WHO- ATC DDO instance SNOMED CT LOINC NEWT ICD-10 mapping

Bacteria

Laboratory

DCO (instances) Disease Drug Local concepts Local formal concepts Global concepts Data normalization

slide-7
SLIDE 7

7

Methods: Semantic interoperability

  • Local RDF data store (local

CDR) models are formalized using a semantically flat data definition ontology (DDO)

  • Local models are mapped to

their respective DDO

  • DDOs are mapped to DCO

closing the gap between local and domain semantics

  • Douglas Teodoro - MIE 2011

mapping

DDO DDO DDO DDO DDO DDO

mapping

HL7- RIM

m a p p i n g

DCO (classes and properties) ER EAV/ CR Local model Local formalized model Shared domain model Data model mapping

  • pen-

EHR

slide-8
SLIDE 8

8

Domain Query

  • ?ab a

dco:Antibiogram; dco:hasResultDate ? date CDR Query

  • ?ab a ddo:Bacteriologie;

ddo:hasDate ?date

Methods: Query model

  • Douglas Teodoro - MIE 2011

Reasoning Mapping CDR

  • Results fetched

and returning in RDF graph format using local terminologies

DCO

DDO3 DDO2 DDO1

Reasoning

Aggregation

Validation

CDR

  • Results fetched

and returned in the RDF graph format using local terminologies

Aggregation Validation

Reasoning

slide-9
SLIDE 9

9

Results: Pilot network

  • Seven healthcare institutions are sharing antimicrobial resistance

data using the framework

  • Douglas Teodoro - MIE 2011

GAMA (Sofia-BG), HUG (Geneva-CH), INSERM (Paris-FR), IZIP (Prague-CZ), LiU (Linköping-SE), TEILAM (Lamia-GR) and UKLFR

(Freiburg-DE)

slide-10
SLIDE 10

10

Results: Ontology added-value

  • Use of ontology for automatic clustering of antibiograms

(e.g. by antibiotic classes)

  • Douglas Teodoro - MIE 2011
slide-11
SLIDE 11

11

Res esult ults: : Per erfor

  • rmance

mance ev evaluat aluation ion

  • In the preliminary tests, a set of long period queries were

performed to evaluate the CDR response time

  • Douglas Teodoro - MIE 2011

Source #Tuples retrieved Retrieval time (s) #Tuples/s SPARQL Network GAMA 0.14 0.00 HUG 74150 5.72 3.91 7704 INSERM 330360 20.38 14.22 9550 IZIP

  • LIU

9905 1.70 1.23 3371 TEILAM 30 0.36 0.00 83 UKLFR 155315 6.34 6.19 12394

  • E.g.: “What is the evolution of resistance of
  • Klebsiella pneumonia from Jun 2005 to Jun 2009?”
  • Network is

responsible for 41% to 49% of the retrieval time for the sets containing more than 1000 tuples

slide-12
SLIDE 12

12

Conclusions

  • Developing a full semantic web-compliant distributed CDR

is feasible

  • Seven healthcare institutions compose the demonstration

network

  • CDR exposes standardized and formalized microbiology

clinical database

  • The query mediation process is limited

– Logically impossible to map a priori from the global to local

  • ntologies

– To be usable by end-users (clinical researches, physicians) the system needs to be encapsulated by query templates

  • Douglas Teodoro - MIE 2011
slide-13
SLIDE 13

13

Conclusions

  • In the query plan, most of the data aggregation is done

centrally

– Push reasoning down to local sources to improve network response

  • A production version of the CDR is expected to be

available for surveillance systems and clinical research by the end of the year

  • Douglas Teodoro - MIE 2011
slide-14
SLIDE 14

14

The Partners

  • Agfa HealthCare, Belgium
  • empirica Gesellschaft für Kommunikations- und Technologieforschung mbH,

Germany

  • Gama Sofia Ltd., Bulgaria
  • Institut National de la Santé et de la Recherche Médicale, France
  • Internetový Pristup Ke Zdravotním Informacím Pacienta (IZIP), Czech

Republic

  • Linköpings Universitetet, Sweden
  • Technologiko Expedeftiko Idrima Lamias, Greece
  • University College London, United Kingdom
  • Les Hôpitaux Universitaires de Genève, Switzerland
  • Universitätsklinikum Freiburg, Germany
  • Université de Genève, Switzerland
  • Averbis, Freiburg, Germany
  • MDA, Czech Republic
  • HEG, Geneva, Switzerland
  • Douglas Teodoro - MIE 2011
slide-15
SLIDE 15

15

CONSTRUCT ?graph WHERE { ?graph a ddo:Concept . }

Methods: Query model

  • Douglas Teodoro - MIE 2011

Retrieving Mappin g Aggregat ion