[PPT] - Coordination via Dialogue Interaction Raquel Fernndez Institute for PowerPoint Presentation

SLIDE 1

Coordination via Dialogue Interaction

Raquel Fernández Institute for Logic, Language & Computation University of Amsterdam

SLIDE 2

Dialogue Modelling

My area of research falls under the heading of Dialogue Modelling:

a fairly new field at the interface of (computational) linguistics,

artificial intelligence, psychology, cognitive science, . . .

concerned with language as it is used in conversation.

In particular, my interests focus on the semantic, pragmatic, and coordination-related aspects of dialogue. Methodologically:

interest in empirical evidence (from corpora or experiments);
interest in computational methods of enquiry and evaluation.

Research area connected to both the Logic & Language and Language & Computation groups at the ILLC.

Raquel Fernández LoLaCo 2012 2 / 31

SLIDE 3

Outline

Two examples of research projects connected to dialogue interaction and coordination:

Colour terms in collaborative reference tasks
Adaptation in child-adult dialogues

Raquel Fernández LoLaCo 2012 3 / 31

SLIDE 4

Interpretation is Flexible

Speakers do not always share identical semantic representations nor identical lexicons. But they are able to communicate successfully most of the time.

Raquel Fernández LoLaCo 2012 4 / 31

SLIDE 5

Sometimes interlocutors negotiate expressions explicitly

A: A docksider. B: A what? A: Um. B: Is that a kind of dog? A: No, it’s a kind of um leather shoe, kinda pennyloafer. B: Okay, okay, got it.

⇒ Thereafter “the pennyloafer”

Susan Brennan & Herbert Clark (1996). Conceptual Pacts and Lexical Choice, Journal

f Experimental Psychology, 22(6):1482–1493.

Herbert Clark & Donna Wilkes-Gibbs (1986). Referring as a collaborative process. Cognition, 22:1–39.

Raquel Fernández LoLaCo 2012 5 / 31

SLIDE 6

Sometimes they implicitly guess their partners’ intentions

They relax the interpretation of their utterances and look for the referent that best matches this looser interpretation. A: a diamond B: ok

[A must mean the tilted square]

A: the salmon shoes B: ok

[A must mean those pink shoes]

Raquel Fernández LoLaCo 2012 6 / 31

SLIDE 7

Can we implement an artificial dialogue agent that is capable of implicit coordination?

Bert Baumgaertner, Raquel Fernández, and Matthew Stone (2012). Towards a Flexible Semantics: Colour Terms in Collaborative Reference Tasks. In Proceedings of the First Joint Conference on Lexical and Computational Semantics (*SEM), Montreal, Canada.

Raquel Fernández LoLaCo 2012 7 / 31

SLIDE 8

Aims

We are interested in modelling implicit adaptation computationally

to get a better understanding of this process
to contribute to the development of dialogue systems that are

able to better coordinate with their human partners. Our focus is on collaborative referential tasks, taking colour terms as a case study. Our aim is to develop dialogue agents that employ flexible semantic representations

Raquel Fernández LoLaCo 2012 8 / 31

SLIDE 9

Intuitions

Our view of how colour terms are used in referential tasks follows basic pragmatic principles: speakers and addressees tend to maximise the success of their joint task while minimising costs.

Gricean maxims of conversation: say enough but not more than is

required (quantity).

Clark & colleagues’ principle of least collaborative effort: minimise the

joint effort of the interlocutors

Raquel Fernández LoLaCo 2012 9 / 31

SLIDE 10

In the domain of colours we take this to mean:

Addressees

are able to relax the interpretation of the speaker’s terms and

look for the referent that best matches this looser interpretation. Speakers

tend to use a basic colour term whenever this is enough
but resort to alternative terms (e.g., ‘bordeaux’ or ‘navy blue’)

in contexts where the basic term is deemed insufficient because there are “competitors”.

Raquel Fernández LoLaCo 2012 10 / 31

SLIDE 11

Our Agent’s Lexicon

Data: publicly available database of RGB codes and colour terms created by Randall Monroe (author of the webcomic xkcd.com)

colour naming survey taken by around two hundred thousand

participants

954 colour terms (the most frequently used by the participants)
paired with a unique RGB code (location in the RGB colour

space most frequently named with the colour term in question.)

Raquel Fernández LoLaCo 2012 11 / 31

SLIDE 12

http://blog.xkcd.com/2010/05/03/color-survey-results/

Raquel Fernández LoLaCo 2012 12 / 31

SLIDE 13

Colour Model and Algorithms

We treat colours as points in a conceptual space

RGB dimensions (ranging from 0 to 255)
each RGB code in the lexicon is considered a prototype colour.
amongst the 954 colour terms in the lexicon, we pick up 10

which we consider basic colours.

we measure colour proximity in terms of Euclidean distances

between RGB values.

Gärdenfors (2000). Conceptual Spaces. MIT Press, Cambridge.

Our algorithms make use of three thresholds:

min: minimum distance required for two colours to be considered

different.

max: maximum range of allowable search for alternative colours
compdist: distance range within which a colour is considered a

competitor

Raquel Fernández LoLaCo 2012 13 / 31

SLIDE 14

What do people actually do?

We conducted two small experiments to collect data about how speakers and addressees use colour terms in referential tasks. The two experiments were run online, with 36 native-English participants: 19 in ExpA and 17 in ExpB.

Generation (ExpA):

∗ participants were shown a series of scenes each with a target ∗ they were asked to refer to the target with a colour term that would allow a potential addressee to identify it in the current context

Resolution (ExpB):

∗ participants were shown a series of scenes each with a colour term ∗ they were asked to pick up the intended referent ∗ the colour terms used were selected from those produced in ExpA

Scenes generated according to two parameters:

∗ basic vs. non-basic target colour (brown or magenta vs. rose or blue) ∗ with or without competitors (colours at a distance threshold)

Raquel Fernández LoLaCo 2012 14 / 31

SLIDE 15

brown chocolate brown dark brown earthy brown poop brown same as mud basic colour w/o competitors 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 blueberry brown chocolate brown colour of mud dark brown basic colour with competitors 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 dark pink dusty rose magenta mauve pink red rose rose pink salmon salmon pink non-basic colour w/o competitors 0.0 0.1 0.2 0.3 0.4 0.5

bright pink dull light fuchsia dull salmon pink dusty rose light mauve light pink light red light salmon lightish pink magenta mauve medium pink

rangish pink

pastel pink pink red rose rose pink salmon salmon pink terra cotta

non-basic colour with competitors 0.0 0.1 0.2 0.3 0.4 0.5

SLIDE 16

Main Experimental Results

ExpA showed that speakers attempt to adapt their colour descriptions to the context and that there is high variability in the terms they choose to do this. ExpB showed that reference resolution is almost always successful despite the high variation of terms observed in ExpA.

Basic colours:

∗ without competitors: participants successfully identified the targets in all cases (100% success rate) ∗ with competitors: 98% success rate ∗ the same results for terms with proportionally high and low freq.

Non-basic colours:

∗ without competitors: 100% success in all cases (low/high freq.) ∗ with competitors: differences as an effect of frequency

◮ terms produced with high frequency: no resolution errors ◮ low frequency terms: resolution success rate dropped to 78% Raquel Fernández LoLaCo 2012 16 / 31

SLIDE 17

Comparing Our Model to the Human Data

The experimental data allows us to make informative

comparisons between humans and our model.

The data is not sufficient for a proper evaluation
but the comparison illuminates how the model can be refined

and what the setup required for a proper evaluation would be.

Raquel Fernández LoLaCo 2012 17 / 31

SLIDE 18

Comparing resolution: success rate

Basic Colours Non-basic Colours high freq. low freq. high freq. low freq. % nc c nc c nc c nc c Humans ExpB 100 98 100 98 100 100 100 78 Resolution algorithm 100 71 100 71 50 100 75 71 c = competitors nc = no competitors

An agent that rigidly associates colours and terms would have

successfully resolved only 4 of the 29 cases, 3 of which were basic colours with no distractors – a 7.25% success rate.

A random algorithm would have an average success rate of 25% (four

potential targets)

Our algorithm is closer to human performance

Raquel Fernández LoLaCo 2012 18 / 31

SLIDE 19

Summary of Results and Open Issues

Our aim has been to model implicit processes of adaptation in referring tasks, focusing on the specific case of colours. The experiments show that speakers differ greatly in the expressions they use, but addressees are nevertheless able to coordinate. Some open issues:

Euclidean distances over RGB values seem too crude – a better

approach closer to human perception (Lab model with Delta-e values?)

We need a more systematic and empirically motivated way to set the

thresholds used by the algorithms.

How to evaluate automatic generation given the amount of variation
bserved?
The performance of the artificial agent should be evaluated in

interaction (integration with a dialogue system)

Can the approach be extended to other types of expressions?

Raquel Fernández LoLaCo 2012 19 / 31

SLIDE 20

Outline

Two examples of research projects connected to dialogue interaction and coordination:

Colour terms in collaborative reference tasks
Adaptation in child-adult dialogues

Raquel Fernández LoLaCo 2012 20 / 31

SLIDE 21

Convergence in Dialogue

Humans have a strong tendency to align with their interlocutors when they are engaged in conversation

convergence on the same vocabulary
adaptation speech rate, accent, pronunciation
syntactic structures
posture, gestures, facial expressions

A variety of terms used in the literature: convergence, alignment, accommodation, tuning, adaptation, chamaleon effect,...

Raquel Fernández LoLaCo 2012 21 / 31

SLIDE 22

Convergence in Asymmetric Interaction

Convergence has been attested mostly in symmetric situations: dialogue between speakers with equivalent linguistic abilities. However, there is also evidence of convergence / adaptation in asymmetric situations:

Human–computer interaction: humans adapt features of their

language to the production of dialogue systems or virtual characters

Native–non-native speakers: native speakers adapt features of

their speech (articulation, speech rate, lexical choice) when talking to non-natives

Child-adult interaction: our focus

it is well known that child-directed speech (CDS) exhibits distinct features at many levels of linguistic processsing

Raquel Fernández LoLaCo 2012 22 / 31

SLIDE 23

Main Aims of Our Study

Contribute to current research on the role of adaptation in CDS.
Corroborate quantitatively the dynamic character of CDS.

∗ by examining real corpus data ∗ by developing quantitative measures that are easy to derive

Study the scope of the adaptation process by looking at different

levels of language processing.

Kunert, Fernández, and Zuidema (2011). Adaptation in Child Directed Speech: Evidence from Corpora, in Proceedings of SemDial 2011, the 15th Workshop on the Semantics and Pragmatics of Dialogue, pp. 112-119, Los Angeles, California.

Raquel Fernández LoLaCo 2012 23 / 31

SLIDE 24

Data

We use the Brown Corpus from the CHILDES database:

3 children: Adam (2;3–5;2), Sarah (2;3–5;1), and Eve (1;6–2;3)
214 transcribed longitudinal conversations (one per corpus file)

An excerpt from the CHILDES Corpus (Adam sub-corpus):

CHI: Why it got a little tire? MOT: Because it’s a little truck. CHI: can’t it be a bigger truck? MOT: that one can’t be a bigger truck but there are bigger trucks.

Raquel Fernández LoLaCo 2012 24 / 31

SLIDE 25

Measures of Speech Complexity

Four simple measures to quantify speech complexity:

Mean Utterance Length: ∼ syntactic complexity
Mean Word Length: ∼ morphological complexity
Mean Number of Word Types: ∼ lexical complexity
Mean Number of Consonant Triples: ∼ phonological complexity

These are combined to obtain a measure of the overall language complexity that acts as a kind of average of the basic measures:

General Complexity (GC): the sum of UL, WL, WT and CT,

after applying the z-score-transform to each (common scale).

Raquel Fernández LoLaCo 2012 25 / 31

SLIDE 26

Complexity against Age

Correlation between WT and UL complexity (vertical axis) and the age of the child in months (horizontal axis) in the Adam corpus.

Child-WT vs. age Child-UL vs. age

20 30 40 50 60 70 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 r=0.82*** 20 30 40 50 60 70 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 r=0.93***

Mother-WT vs. age Mother-UL vs. age

20 30 40 50 60 70 0.8 1 1.2 1.4 1.6 1.8 2 r=0.65*** 20 30 40 50 60 70 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 5.8 6 r=0.60***

Raquel Fernández LoLaCo 2012 26 / 31

SLIDE 27

Measuring Correlations

Our interest is in investigating whether the child’s and caretaker’s utterances are correlated and what the possible causes for these correlations are.

We use the Pearson product-moment correlation coefficient: we

calculate Pearson’s r for each measure X and pair of DPs j, k.

Raquel Fernández LoLaCo 2012 27 / 31

SLIDE 28

Correlation between complexity of child utterances (horizontal axis) and the mother’s utterances (vertical axis) in the Adam corpus:

utterance length word length

1 2 3 4 5 6 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 5.8 6 r=0.67*** 3 3.5 4 4.5 3.35 3.4 3.45 3.5 3.55 3.6 3.65 3.7 3.75 r=0.57***

word types consonant triples

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.8 1 1.2 1.4 1.6 1.8 2 r=0.55*** 0.05 0.1 0.15 0.2 0.25 0.1 0.15 0.2 0.25 0.3 0.35 r=0.62***

Raquel Fernández LoLaCo 2012 28 / 31

SLIDE 29

Results across Corpora

Adam Sarah Eve

CT WT WL UL general complexity −0.5 0.5 1 Pearson r

N=55

*** *** *** *** *** ADAM−MOT CT WT WL UL general complexity −0.5 0.5 1 Pearson r

N=132

*** *** ** *** *** SARAH−MOT CT WT WL UL general complexity −0.5 0.5 1 Pearson r

N=20

** *** ** EVE−MOT

Correlations are robust across measures and child-mother pairs.

Raquel Fernández LoLaCo 2012 29 / 31

SLIDE 30

Summary of Results and Open Issues

We have investigated the dynamics of CDS by quantifying linguistic complexity with simple corpus-based measures.

There are strong correlations between the complexity of child

and mother utterances.

∗ there is dynamic adaptation between mother and child ∗ these correlations are not entirely explained by the child’s age and the repetitions in the dialogue ∗ they seem to depend on adaptation at the micro-level of dialogue interaction

Some issues that need further investigation:

What dialogue mechanisms may explain the observed

correlations (beyond age and repetition factors)?

Does convergence / adaptation enhance language acquisition?
Challenge: model dynamic alignment (coordination + change)

Raquel Fernández LoLaCo 2012 30 / 31

SLIDE 31

Conclusions

Some of the most interesting aspects of language are tied to dialogue.
Speakers do not always share identical semantic representations nor

identical lexicons: coordination is critical.

Our linguistic theories should be able to account for (or at least be

compatible with) dialogue coordination mechanisms.

One of the key challenges is to model coordination and learning

exploiting processes at the microlevel of dialogue interaction.

The most fun and scientifically interesting way to investigate these

issues is by combining formal theories with processing actual data and implementing artificial agents.

Raquel Fernández (in press). Dialogue, in Oxford Handbook of Computational Linguistics, 2nd Ed., Oxford University Press.

Raquel Fernández LoLaCo 2012 31 / 31