Presented by Amel Benna Agenda Background 1. Data Integration - - PowerPoint PPT Presentation

presented by amel benna agenda
SMART_READER_LITE
LIVE PREVIEW

Presented by Amel Benna Agenda Background 1. Data Integration - - PowerPoint PPT Presentation

The International Workshop on Advanced Information Systems for Enterprises Constantine April,19-20, 2008 Nadir Salhi , & Amel Benna, CERIST, University A/Mira Bejaia, Algeria Zaia Alimazighi LSI, USTHB Algers, Algeria Bilal Amrouche &


slide-1
SLIDE 1

The International Workshop on Advanced Information Systems for Enterprises Constantine April,19-20, 2008

Nadir Salhi,& Amel Benna, CERIST, University A/Mira Bejaia, Algeria Zaia Alimazighi LSI, USTHB Algers, Algeria Bilal Amrouche & Ferhat Makhloufi INI, Algeria

Presented by Amel Benna

slide-2
SLIDE 2

Agenda

1.

Background

2.

Data Integration issues

3.

Our Approach for Data Integration

  • Architecture
  • Schema Description Base
  • Query process

4.

Implementation

5.

Conclusion & Perspectives

Constantine April,19-20, 2008 IWAISE'08 2

slide-3
SLIDE 3

Background

Today… the issue is too many databases, too much information

  • Information Systems are evolving

in heterogeneous & distributed environments.

  • In order to be efficient, companies

need to manage and integrate all information sources taking into account semantics.

Constantine April,19-20, 2008 IWAISE'08 3

slide-4
SLIDE 4

Oracle ...

Source n Schema Source 1 Schema

Diverse data sources SQL Server

Source 2 Schema

DB2

Background

  • How can data sources cooperate?
  • How to integrate new sources?
  • How to find data semantics?

Different ways to Query Different ways to Reply

Data Integration System

  • to provide a uniform access to heterogeneous source.
  • to join partial replies from heterogeneous sources.

Constantine April,19-20, 2008 IWAISE'08 4

slide-5
SLIDE 5

Data Integration Issues

Definition: “The data integration is the process by which several

sources of autonomous data, distributed and under heterogeneous shape are integrated as a unique source represented by a global schema”.

Constantine April,19-20, 2008 5

Among Issues to be addressed : Heterogeneity Model level: RDBMS, OODBMS, XML, … Structure: Eg. DB1:Book (Title, Author,) ,DB2:Book(Title, ISBN,) Semantics: Names: Eg. Label “NAME” used for Book Title, Author,… Scaling & precision conflicts:

  • Eg. Book price in DB1 in Euro with VAT, in DB2 in $ without

VAT.

IWAISE'08

slide-6
SLIDE 6

Data Integration Issues

Related Research in Semantic Interoperability for DB is categorized

  • 1. Query-oriented (based on declarative languages or extended SQL)

Source 1 Source 2 Source n Multibase Language User

+

Scalability

  • Manual resolution of

semantic conflicts

Constantine April,19-20, 2008 IWAISE'08 6

slide-7
SLIDE 7

Data Integration Issues

Related Research in Semantic Interoperability for DB is categorized

  • 1. Query-oriented (based on declarative languages or extended SQL)
  • 2. Mapping-based (mapping between global & local schemas)

+

Transparency & semantic conflicts resolved

Source 1 Source 2 Source n Global schema User Integration

  • Dependancy of particular

global schemas Scalablility Complexity of building global schema

Constantine April,19-20, 2008 IWAISE'08 7

slide-8
SLIDE 8

Data Integration Issues

Related Research in Semantic Interoperability for DB is categorized

  • 1. Query-oriented (based on declarative languages or extended SQL)
  • 2. Mapping-based (mapping between global & local schemas)
  • 3. Intermediary-based (Mediator-Wrapper)

Source 1 Source 2 Source n Wrapper 1 Wrapper 2 Wrapper n Mediator User System

Mediator :

  • Integrates data from different representations

(mapping using GAV or LAV)

  • Decompose the query
  • Re-compose the replies

Wrappers convert to common representation Query from mediator & Reply from source.

Global schema

Constantine April,19-20, 2008 IWAISE'08 8

slide-9
SLIDE 9

Our Approach for Data Integration

  • 1. Intermediary-based approach (Mediator-Wrapper)
  • 2. Use domain ontology to resolve semantic conflicts
  • 3. We have defined “Schema Description Base” to store

and manage mappings between ontology and sources

  • 4. A user Query Format based on ontology concept and

similar to SQL.

  • 5. Algorithms for localization of the sources,

decomposition of the query, re-composition of the replies. Focused on relational data bases as data sources.

Constantine April,19-20, 2008 IWAISE'08 9

slide-10
SLIDE 10

Architecture

User Level Mediator Level Database Level Schema Description Base Schema Description Base Query Processin g Module Query Processin g Module Request Reply Ontology Wrapper Wrapper Wrapper

Constantine April,19-20, 2008 10 IWAISE'08

slide-11
SLIDE 11

Architecture

1.

User Level

The user has an interface allowing him to write his requests using

  • ntology concepts.

The ontology is described with OWL language: concepts,

properties and relations.

The user's request is written in the format:

11 Constantine April,19-20, 2008

SELECT [List of properties] FROM [List of concepts | relation between concepts] WHERE [List of conditions]

Individual Book Student Author Name ISBN Name Write Is a has Property Relation Concept Example : Domain Ontology

Eg.: SELECT BOOK.ISBN, Author.Name FROM Book, Author, Write(Book, Author) WHERE Book.price<100

IWAISE'08

slide-12
SLIDE 12

Architecture

Mediator Level

Schema Description Base Query Processing Module Query Processing Module Ontology Wrapper Wrapper Wrapper

Constantine April,19-20, 2008 12 IWAISE'08

slide-13
SLIDE 13

Schema Description Base: Mapping Ontology - Source

The Schema Description Base is a database that store mappings

between ontology and sources.

In our case, this is done manually by the DBA of each source. Our mapping is based on the methodology of building an ontology

from a relational DB.

This mapping can be defined as follows :

  • Every attribute of a schema source can be associate to a property
  • r to a Concept.
  • Every foreign key can be associated to an ontology relation

Constantine April,19-20, 2008 IWAISE'08 13

slide-14
SLIDE 14

Schema Description Base

Constantine April,19-20, 2008 IWAISE'08 14

slide-15
SLIDE 15

Query Process

Constantine April,19-20, 2008 IWAISE'08 15

slide-16
SLIDE 16

Query Process

1.

Analysis of the global request:

Extracting the different components of the global request Finding equivalent elements in the sources.

Constantine April,19-20, 2008 16

2.

Localization of the sources :

  • Select from the Schema Description Base the sources that provide

a partial or complete answer to the global request. Relevant source contain:

  • All the attributes equivalent to the elements of the global

request.

  • Partial properties of the global request that can be joined with
  • ther attributes of other sources.
  • Some of the properties of the global request.

IWAISE'08

slide-17
SLIDE 17

Query Process

  • 3. Decomposition and Re-writing of the global

request into sub-queries

Q : Decomposition (eg. Book name, Book Author, City) Qn (Sn) (Eg. ISBN,City, Edition) Q5 (S5) … Q1 (S1) (eg.Book name, Book Author, ISBN)

Source 1 Source n Source 5

Constantine April,19-20, 2008 IWAISE'08 17

slide-18
SLIDE 18

Query Process

  • 4. Execution of sub-query
  • Each sub query is run by each of the local DBMS
  • Wrapper translates the replies generated from the DBMS into

a common format for the mediator.

Constantine April,19-20, 2008 18

  • 5. Re-composition of the replies:

R1 (S1) (eg.Book name, Book authors, ISBN) R5 (S5) … (eg. Book name, Book authors, City) R3 (S3) (Eg. ISBN,City, Edition R: Recomposition R5 (S5) ∪ (R1 (S1) ∩ R3 (S3)) (eg. Book name, Book authors, City)

IWAISE'08

slide-19
SLIDE 19

Implementation

Constantine April,19-20, 2008 19

Application Level Wrapper (Web Service)

Tomcat Application server AXIS2

Ontologie OWL

Query Processing Module

JAVA DB1 DB2 DBn

Databases

Schema Description Base PostgreSQL + MySQL PostgreSQL Jena API

IWAISE'08

slide-20
SLIDE 20

Conclusion & Perspectives

Our approach is based on:

  • Intermediary-based approach (Mediator-Wrapper)
  • A shared ontology that respects the autonomy of every relational

source, and resolve some semantic conflicts.

  • A newly defined concept of “Schema Description Base” to find

relevant sources.

  • A user Query Format based on ontology concept and similar to

SQL.

  • Specific Algorithms for the Query Processing Module.

Prototype Implemented.

Constantine April,19-20, 2008 IWAISE'08 20

slide-21
SLIDE 21

Conclusion & Perspectives

In our solution, the mapping is done manually for every relational

source.

Our future work, is about :

Automating management of mappings Define other criteria for joining sources. Optimize the query process.

Constantine April,19-20, 2008 IWAISE'08 21

slide-22
SLIDE 22

Thank You

Constantine April,19-20, 2008 IWAISE'08 22