building multilingual domain WordNets in a Wiki Way Andrea - - PowerPoint PPT Presentation

building multilingual domain wordnets in a wiki way
SMART_READER_LITE
LIVE PREVIEW

building multilingual domain WordNets in a Wiki Way Andrea - - PowerPoint PPT Presentation

building multilingual domain WordNets in a Wiki Way Andrea Marchetti, Francesco Ronzano, Maurizio Tesconi, Salvatore Minutoli Web Applications for the Future Internet Group Institute of Informatics and Telematics IIT-CNR, Pisa Overview


slide-1
SLIDE 1

building multilingual domain WordNets in a Wiki Way

Web Applications for the Future Internet Group Institute of Informatics and Telematics IIT-CNR, Pisa

Andrea Marchetti, Francesco Ronzano, Maurizio Tesconi, Salvatore Minutoli

slide-2
SLIDE 2
  • Multilingual Web
  • The Wiki paradigm: collaborative management of knowledge resources
  • Wikyoto Knowledge Editor
  • Wikyoto and the KYOTO System
  • Knowledge editing features of Wikyoto
  • External resources: references to model domain knowledge
  • Architectural overview
  • Wikyoto on-line
  • Ongoing work

Overview

slide-3
SLIDE 3

Source Ethnologue

slide-4
SLIDE 4

Source Netz-Tipp.De 2002

slide-5
SLIDE 5

Languages used to access Google

Source http://www.netz-tipp.de/languages.html

slide-6
SLIDE 6

Multilingual Web: some statistic...

350 39 59 60 65 75 82 99 153 445 537

100 200 300 400 500 600 REST Korean Russian French Arabic German Portuguese Japanese Spanish Chinese English

Million of native speaker Internet users

The languages spoken over the Web (June 2010)

June 30, 2010 - Source: Internet World Stats

slide-7
SLIDE 7

June 30, 2010 - Source: Internet World Stats

588% 421% 1825% 398% 2501% 173% 989% 110% 743% 1277% 281%

5 10 15 20 25 30 REST Korean Russian French Arabic German Portuguese Japanese Spanish Chinese English

Percentage of growth of the Intenet user language community from 2000 to 2010

The growth of language communities between 2000 and 2010

Arabic, Russian, Portuguese and Spanish are the most growing Web languages … thus the accesss to Web content across different languages is becoming fundamental

Multilingual Web: some statistic...

slide-8
SLIDE 8

KYOTO Overview

HTML

Syntactic & Semantic Annotation Fact Extraction Cross-lingual Semantic Search Multilingual Knowledge Base

¿Cuál es el impacto del cambio climático sobre la biodiversidad? Qual’è l’impatto del cambiamento climatico sulla biodiversità? What is the impact

  • f climate change
  • n biodiversity?

Web documents from 7 languages are uploaded They are annotated by a pipeline of linguistic tools Language indipendent facts are exctracted Users can perform queries in

  • ne of the 7 language

Multilingual Knowledge Base represents the knowledge background necessary for each steps

slide-9
SLIDE 9

Multilingual Knowledge Base Architecture

cat animal is a Domain model

ENGLISH ITALIAN SPANISH .......

Linguistic Information

[cat, true cat] NOUN Feline mammal usually having soft fut [gatto, micio] NOUN Mammifero carnivoro [gata, gato] NOUN Mamifero felino que normalmente tiene...

Domain Model, language-independent describing a specific domain with a set of concepts and relations (i.e. an ontology) - Kyoto Central Ontology Linguistic Information specific to each considered language - WordNets Mapping the Linguistic Information over the Domain Model

Multilingual Knowledge Base

slide-10
SLIDE 10

Extend linguistic information

Multilingual Knowledge Base frog amphibian .................................................................... ..................................................................... ..as we can notice from the figure. In the southern part of the island tree frogs and gopher frogs are widely diffused; when in 1994 the great fire destroyed the most

  • f the wood that...

..................................................................... .....................................................................

slide-11
SLIDE 11

Extend linguistic information

Multilingual Knowledge Base frog amphibian .................................................................... ..................................................................... ..as we can notice from the figure. In the southern part of the island tree frogs and gopher frogs are widely diffused; when in 1994 the great fire destroyed the most

  • f the wood that...

..................................................................... ..................................................................... poison frog tree frog gopher frog

To improve the kyoto performance the multilingual knowledge base has to be extended with linguistic information belonging to the Environment Domain

slide-12
SLIDE 12

frog, toad, toad frog, anurann

Any of various tailless stout-bodied anphibian

True frog, ranid

Insectivourous usually semiaquatic web-footed

hyperonym Generic

WordNet

Generic & Domain WordNets

KYOTO Central Ontology Domain WordNet

Gopher frog

The Gopher Frog (Rana Capito) Is a species of frog in the...

hyperonym equivalence

slide-13
SLIDE 13

Building the Domain WordNet

Multilingual Knowledge Base

IT’S HONEROUS by involving domain experts to extend, customize and maintain the Multilingual Knolwedge Base

frog amphibian poison frog tree frog robber frog

IT’S IMPORTANT: a richer Knowledge Base improves the semantic analysis EXPERIENCE OF SOCIAL WEB THE WIKI PARADIGM

slide-14
SLIDE 14

Wikipedia-like applications

Difficult editing of complex knowledge structures Full editing features, only for knowledge engineers

Rich Web applications

Limited editing possibilities, mainly editors of taxonomies of concepts

Knowledge Resources Editing Environments Survey

Desktop applications

Full editing features , used for complex resources,

  • nly for knowledge engineers

The Wiki paradigm in KYOTO

slide-15
SLIDE 15

Knowledge structuring Complexity of use

Wikyoto is a balance between complexity of use and formalization of the edited knowledge Desktop applications Wikipedia-like applications Rich Web applications

The Wiki paradigm in KYOTO

slide-16
SLIDE 16

Wikyoto The Knowledge Editing Flow

Create Edit Link

KYOTO Terminology External resources SKOS Thesauri KYOTO Central Ontology

Gopher Frog

slide-17
SLIDE 17

Wikyoto User

The Wikyoto Knowledge Editor

Global architecture

KYOTO Terminology External resources SKOS Thesauri KYOTO Central Ontology

frog poison frog tree frog gopher frog pollution Air pollutiion Water pollution Nutrient pollution

slide-18
SLIDE 18

KYOTO Terminology Generic & Domain WordNets KYOTO Web API INTERNET

Concept User

SKOS Thesauri Kyoto Ontology DBpedia Web SPARQL Queries

The Wikyoto Knowledge Editor

System architecture

slide-19
SLIDE 19

The Wikyoto Knowlwdge Editor Main Features

  • Versioning (like media wiki)
  • Concurrency Management (synset lock)
  • Statistical Data
  • Exploiting External Resources
  • Semplify linking to the Ontology

– TMEKO Procedure

slide-20
SLIDE 20

More information at: http://www.kyoto-project.eu/

Section: System Architecture and Demo

http://www.wikyoto.net/

DEMO

slide-21
SLIDE 21

KYOTO Terminology

frog endemic frog poison frog gopher frog golden poison frog

...frogs represent the most diffused... ...habitat

  • f many

frog species... ... with endemic frogs that are... The golden poison frog tipically... ...with gopher frogs represent...

  • automatically extracted by

mining KYOTO parsed documents

  • terms are organized in

taxonomies

  • each term has one or more

document occurrences

External Resources – Kyoto Terminology

The KYOTO Terminology is:

slide-22
SLIDE 22

SKOS Thesauri Simple Knowledge Organization System (SKOS)

  • data model for thesauri,

taxonomies, classification schema

  • W3C standard based on RDF
  • widely exploited by the

Semantic Web community

  • organized in the basis of:
  • concepts (with labels,

descript.)

  • relations: broader/

narrower / relatedTo

frog amphibian A class of vertebrate animals characterized by a moist, glandular skin, gills at some stage

  • f development...

Any insectivorous anuran amphibian

  • f the family Ranidae, such as Rana

temporaria of Europe,having... poison dart frog Poison dart frog is the common name

  • f a group of frogs

in the family Dendrobatidae which are native to Central and South America. skos:broader skos:narrower dew pond A Dew pond is an artificial pond usually sited on the top of a hill skos:definition skos:definition skos:definition skos:relatedTo skos:definition

External Resources – Skos Thesauri

slide-23
SLIDE 23

SKOS Thesauri Simple Knowledge Organization System (SKOS) Thesauri converted to SKOS format:

  • General Multilingual

Environmental Thesaurus (GEMET): 2K concepts

  • Species 2000: 2M concepts
  • Habitat types from EUNIS

Biodiversity Database: 1k concepts

  • WWF Ecoregions Database:

1K concepts

frog amphibian A class of vertebrate animals characterized by a moist, glandular skin, gills at some stage

  • f development...

Any insectivorous anuran amphibian

  • f the family Ranidae, such as Rana

temporaria of Europe,having... poison datr frog Poison dart frog is the common name

  • f a group of frogs

in the family Dendrobatidae which are native to Central and South America. skos:broader skos:narrower dew pond A Dew pond is an artificial pond usually sited on the top of a hill skos:definition skos:definition skos:definition skos:relatedTo skos:definition

External Resources – Skos Thesauri

slide-24
SLIDE 24

DBpedia – Wikipedia for the Semantic Web

a community effort to extract structured semantic information from Wikipedia and to make this information available on the Web

You can access and query:

  • 2.6 million things (213K persons, 328k places, …) in 30

different languages

  • 609,000 links to images
  • 3,150,000 links to external Web pages
  • 415,000 Wikipedia categories
  • ...

External Resources – DBpedia

slide-25
SLIDE 25

building multilingual domain WordNets in a Wiki Way

Web Applications for the Future Internet Group Institute of Informatics and Telematics IIT-CNR, Pisa

Thank You!