Motivation Suppose you have an information system with several - - PDF document

motivation
SMART_READER_LITE
LIVE PREVIEW

Motivation Suppose you have an information system with several - - PDF document

Topic Maps Extraction in Oveia Topic Maps Extraction in Oveia: : Specification and Processing Specification and Processing Extra o o de de Topic Topic Maps Maps no no Oveia Oveia: : Extra Especifica o e Processamento


slide-1
SLIDE 1

1

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 1

Topic Maps Extraction in Topic Maps Extraction in Oveia Oveia: : Specification and Processing Specification and Processing Extra Extraç ção ão de de Topic Topic Maps Maps no no Oveia Oveia: : Especifica Especificaç ção e Processamento ão e Processamento

Giovani R. Librelotto José Carlos Ramalho Pedro R. Henriques

Department of Informatics University of Minho Portugal

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 2

Motivation

  • Suppose you have an information system with

several heterogeneous data resources:

– Relational databases, XML documents, etc…

  • You want to achieve semantic interoperability

between those data resources;

  • You want to do it fast
slide-2
SLIDE 2

2

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 3

Motivation

  • The use of ontologies is a good approach to
  • vercome the problem of semantic

heterogeneity;

  • This supports the usefulness of Topic Maps;
  • However tools to build Topic Maps are crucial

because the Topic Maps creation is an hard task.

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 4

Index

  • Basic Concepts
  • Our approach
  • Inside Oveia

Oveia

  • Case Study
  • Conclusion
slide-3
SLIDE 3

3

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 5

Ontology

  • Metaphysical branch of study which is

concerned with existence and the nature

  • f being;

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 6

Ontology

  • An ontology is just a set of words and

relationships that formally describes an universe of discourse or context.

Pelé Football Brasil The best of Game played country World Champion Plays Most popular sport player sport Five times

slide-4
SLIDE 4

4

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 7

Ontology Specification

  • Specifications Standards:

– RDF(S): Resource Description Format – DAML/OIL: Darpa Agent Markup Language – OWL: Ontology Web Language – XTM: XML Topic Maps (our choice)

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 8

“Topic maps are a new ISO standard for describing Topic maps are a new ISO standard for describing knowledge structures and associating them with knowledge structures and associating them with information resources information resources” ” The TAO of Topic Maps The TAO of Topic Maps, , Steve Pepper, 05 Steve Pepper, 05-

  • 2000

2000

  • Topics

Topics

  • Associations

Associations

  • Occurrences

Occurrences

  • However

However too too much much work work to to create create a real a real Topic Topic Map Map. .

Topic Maps

slide-5
SLIDE 5

5

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 9

Ontology Support

  • 94 tools and similar environments to

support creation, use, and maintenance

– Ontology Tools Survey, Revisited by Michael Denny, July 14, 2004, www.xml.com

  • However no one for the automatic creation of

Topic Maps.

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 10

Index

  • Basic Concepts
  • Our approach
  • Inside Oveia

Oveia

  • Case Study
  • Conclusion
slide-6
SLIDE 6

6

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 11

Metamorphosis Metamorphosis

Metadata Extractor and Ontology Builder Ontology Validator Ontology Navigator

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 12

Index

  • Basic Concepts
  • Our approach
  • Inside Oveia

Oveia

  • Case Study
  • Conclusion
slide-7
SLIDE 7

7

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 13

Oveia Oveia

  • A Topic Maps extractor from heterogeneous

information system composed of two engines:

– Metadata Extractor: collects pieces of information and stores them in an intermediate representation; – Ontology Builder: uses a specification to transform the intermediate representation into an ontology according to Topic Maps standard.

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 14

Oveia Oveia Metadata Extractor + Ontology Builder

Oveia Oveia

slide-8
SLIDE 8

8

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 15

Metadata Extractor

  • XSDS

XSDS (XML Specification of Data Sources) (XML Specification of Data Sources)

  • Supports different kinds of sources

Supports different kinds of sources (relational databases, XML files, (relational databases, XML files, … …) )

  • Uses a

Uses a driver driver for each data source for each data source

  • Creates an intermediary representation

Creates an intermediary representation (called (called Dataset Dataset) )

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 16

Extractor Specification

<resources> <datasources> <datasource extratorDriver="br.uneb.dcet.tmbuilder.drivers.DataBase" name=“xata2004"> <parameter name="connectionURL"> jdbc:mysql://localhost/XATA2004 </parameter> <parameter name="password"/> <parameter name="user">root</parameter> <parameter name="jdbcDriver">

  • rg.gjt.mm.mysql.Driver

</parameter> </datasource> </datasources> <datasets> ... </datasets> </resources>

<dataset name=“Authors" database=“xata2004"> SELECT code, name, url FROM author-table </dataset> <dataset name=“Papers" database=“xata2004"> SELECT code, title FROM paper-table </dataset>

slide-9
SLIDE 9

9

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 17

Datasets

  • An intermediary representation;

An intermediary representation;

  • Contains

Contains all all data data extracted extracted from from information information resources resources; ;

  • Is

Is the the input input to to the the XS4TM XS4TM processor processor; ;

  • Data

Data is is stored stored in in table table format format: :

– – Line Line x x collumn collumn

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 18

Ontology Builder

  • XS4TM

XS4TM (XML (XML Specification Specification for for Topic Topic Maps Maps) )

– – Ontology extraction specification Ontology extraction specification

  • XTM becomes a sub

XTM becomes a sub-

  • set of XS4TM

set of XS4TM

  • XS4TM has 2 parts:

XS4TM has 2 parts:

– – Abstract Structure Abstract Structure – – Instances (catalog) Instances (catalog)

slide-10
SLIDE 10

10

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 19

OntoBuilder Specification

<instances> <topic dataset="Categorias"> <instanceOf> <topicRef xlink:href="#Categorias"/> </instanceOf> <baseName> <baseNameString> @Categorias.Descricao </baseNameString> </baseName> </topic> ... </instances> Reference to the extracted dataset

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 20

XSDS x XS4TM

slide-11
SLIDE 11

11

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 21

Generated topic map

  • After the XS4TM processing, Oveia

generates a topic map stored in memory;

  • Oveia has two possible output formats:

– – XTM file: XTM file: an XML document. – – OntologyDB OntologyDB: : a relational database designed according to Topic Maps standard.

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 22

Index

  • Basic Concepts
  • Our approach
  • Inside Oveia

Oveia

  • Case Study
  • Conclusion
slide-12
SLIDE 12

12

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 23

City Capital Braga Lisboa Event Summer School Conference XATA Institution School University UMinho

abstract concepts (topic types) concrete concepts (topics) information resources (occurrences)

  • rganizes

happens

  • rganizes

happens is placed http://www... http://www... http://www... mailto: mailto: SQL is placed

Case Study: Conferences

<xstm> <ontologies> <topic id=“city"> <baseName> <baseNameString>City</baseNameString> </baseName> </topic> <topic id=“capital"> <instanceOf> <topicRef xlink:href="#city"/> </instanceOf> <baseName> <baseNameString>Capital</baseNameString> </baseName> </topic> … … <instances> <topic dataset=“DS-City"> <instanceOf> <topicRef xlink:href="#city"/> </instanceOf> <baseName> <baseNameString>@DS-City.name</baseNameString> </baseName> <occurrence> <scope> <topicRef xlink:href="#country"/> </scope> <resourceData>@DS-City.country</resourceData> </occurrence> </topic> … <association> <instanceOf> <topicRef xlink:href="#city-instituition"/> </instanceOf> <member> <roleSpec> <topicRef xlink:href="#is-placed"/> </roleSpec> <topicRef xlink:href="@DS-Institution.city"/> </member> <member> <roleSpec> <topicRef xlink:href="#places"/> </roleSpec> <topicRef xlink:href="@DS-City.id"/> </member> </association>

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 24

Conclusion

  • This presentation appears in the context of

the integration of heterogeneous information systems using the ontology paradigm and suggests the use of Topic Maps to describe the ontologies.

  • Oveia is an architecture for the automatic

construction of Topic Maps with data extracted from information systems.

slide-13
SLIDE 13

13

GRLibrelotto & JCRamalho & PRHenriques, CLEI’04, September 2004 25

Future Work

  • Front-end development:

– XSDS: datasource spec. – XS4TM: ontology builder spec.

  • Part of this work is being integrated in an

european Eureka project: IKF-P E!2235 “Information Knowledge Fusion”