P. Hamk, I, Kopeek, R. Olejek , J. Plhk LAB OF SOFTWARE - - PowerPoint PPT Presentation

p ham k i kope ek r o lej ek j plh k
SMART_READER_LITE
LIVE PREVIEW

P. Hamk, I, Kopeek, R. Olejek , J. Plhk LAB OF SOFTWARE - - PowerPoint PPT Presentation

DIALOGUE-BASED INFORMATION RETRIEVAL FROM IMAGES P. Hamk, I, Kopeek, R. Olejek , J. Plhk LAB OF SOFTWARE ARCHITECTURES AND INFORMATION SYSTEMS FACULTY OF INFORMATICS MASARYK UNIVERSITY 2 R. Olejek, ICCHP'14, Paris Motivation


slide-1
SLIDE 1

LAB OF SOFTWARE ARCHITECTURES AND INFORMATION SYSTEMS FACULTY OF INFORMATICS MASARYK UNIVERSITY

DIALOGUE-BASED INFORMATION RETRIEVAL FROM IMAGES

  • P. Hamřík, I, Kopeček, R. Ošlejšek, J. Plhák
slide-2
SLIDE 2
  • R. Ošlejšek, ICCHP'14, Paris

2

Motivation – Communicative Images

  • Communicative image

An image enabling users to explore its content by means

  • f dialogues.

Window to the depicted world fully accessible through natural language.

slide-3
SLIDE 3
  • R. Ošlejšek, ICCHP'14, Paris

3

Key Principles – Annotated Pictures

  • Semantics: System of OWL/RDF ontologies for picture

annotation and shared multilingual knowledge. Defjnes grammar of the dialogue system.

  • Graphic format: SVG as fmexible XML wrapper enabling us

to embed the original raster image together with structured semantics

slide-4
SLIDE 4
  • R. Ošlejšek, ICCHP'14, Paris

4

Key Principles – Dialogue Subsystem

  • Restricted grammars (only a small fragment of natural

language):

Generic grammar: “Describe picture.”, “What is in the picture”, etc.

What-Where Language: “Where is object?”, “What is in the upper-left corner?”.

Experimental domain-specifjc grammars: Fine-tuned for concrete picture.

  • Dialogue frames: templates for questions with slots that

can be fjlled by specifjc entries from ontologies.

“How far is it from SLOT1 to SLOT2?”

slide-5
SLIDE 5
  • R. Ošlejšek, ICCHP'14, Paris

5

Workfmow

  • Client (e.g. plug-in to web browser) sends an

image to our server

The image can be communicative or not

JavaEE server providing REST services

  • Server embeds the image in SVG and do

additional preprocessing

Auto-detection and image-recognition techniques would help to gather initial semantics [in development]

  • Client sends questions (sentences) to the server,

dialogue module parses the questions, inspects

  • ntology and composes answer

No intelligence on the client side

slide-6
SLIDE 6
  • R. Ošlejšek, ICCHP'14, Paris

6

Demo

  • Exact transcription of our experimental system
  • Adjusted for screen readers:

– http://lsd.fj.muni.cz/~xplhak/gate/

  • With styles:

– http://lsd.fj.muni.cz/~xplhak/virtualgate/

slide-7
SLIDE 7
  • R. Ošlejšek, ICCHP'14, Paris

7

Goal 1: identify (hidden) image

What is in the middle? There are Philip, Jesus, T able, John, Peter, Thomas, Judas and James Major in the central part. What is Jesus doing? In this picture, Jesus is Sitting. Hwo is author? This painting was created by Leonardo da Vinci. When did Leonardo create the painting? Painting has been fjnished in 1498.

slide-8
SLIDE 8
  • R. Ošlejšek, ICCHP'14, Paris

8

Goal 2: Find Judas

Where is Judas? Judas is in the middle of the picture. How Judas looks? In this picture, Judas is Withdrawn. Is Judas next to Jesus? From your point of view, Judas is not next to Jesus. Judas is on the left of Jesus.

Judas?

slide-9
SLIDE 9
  • R. Ošlejšek, ICCHP'14, Paris

9

Goal 2: Find Judas (cont.)

What is Peter doing? In this picture, Peter is T alking and Standing. Is Judas in front of Peter? From your point of view, Judas is in front of Peter.

Judas and Peter

slide-10
SLIDE 10
  • R. Ošlejšek, ICCHP'14, Paris

10

User evaluation

  • 4 blind users and 4 sighted users
  • T

esting scenarios

– Start the interaction with the picture in any way you

  • like. And end it at any point you like.

– If the user haven’t done it in the previous scenario,

then:

  • Obtain general information about the picture
  • Learn who painted the painting in the picture.
  • List all people in the picture.
  • ...
  • Evaluation: quantitative and qualitative

questionnaire

slide-11
SLIDE 11
  • R. Ošlejšek, ICCHP'14, Paris

11

Current Limits and Future Goals

  • Manual annotation

Boring and exhausting, prone to errors even when using supporting tools like Protege.

  • Auto-learning dialogue strategy

User question “What is the castle behind Jane?” indicates that there is some castle and some object called Jane in the picture.

The communicative picture takes over the initiative to learn more about these two things, asking the user “Who or what is Jane?” and then extending the

  • ntology with these new facts.
slide-12
SLIDE 12
  • R. Ošlejšek, ICCHP'14, Paris

12

Current Limits and Future Goals (cont.)

  • Manually confjgured dialogues

Carefully prepared and fjne-tuned grammars and dialogue frames for concrete domain (picture content).

  • Dialogues generated from ontologies

Frames driven by ontology structure

Object and data properties = frames (utterances).

Classes and datatypes involved in properties = slots.

Individuals = slot values.

slide-13
SLIDE 13
  • R. Ošlejšek, ICCHP'14, Paris

13

Thank you for your attention

Questions?