[PPT] - Representing a concept by the distribution of names of its instances PowerPoint Presentation

SLIDE 1

Representing a concept by the distribution of names of its instances

Matthijs Westera, Gemma Boleda and Sebastian Padó

SLIDE 2

Representing a concept by the distribution of names of its instances

Matthijs Westera, Gemma Boleda and Sebastian Padó

A b h i j e e t G u p t a &

SLIDE 3

Interest in Distributional Semantics (etc.)

SLIDE 4

Interest in Distributional Semantics (etc.)

Relation to formal semantics;

SLIDE 5

Interest in Distributional Semantics (etc.)

Relation to formal semantics;
Relevance to experimental linguistics;

SLIDE 6

Interest in Distributional Semantics (etc.)

Relation to formal semantics;
Relevance to experimental linguistics;
Relation between language and the world.

SLIDE 7

Interest in Distributional Semantics (etc.)

Relation to formal semantics;
Relevance to experimental linguistics;
Relation between language and the world.

SLIDE 8

8

Language and the world

SLIDE 9

9

Language and the world

… that dog ate my shoe …

SLIDE 10

10

Language and the world

… that dog ate my shoe … … a young dog is called a puppy …

SLIDE 11

11

Language and the world

… that dog ate my shoe … … a young dog is called a puppy … … every cat ate too much …

SLIDE 12

12

Language and the world

… that dog ate my shoe … … a young dog is called a puppy … … every cat ate too much … … when my cat was young she …

SLIDE 13

13

Language and the world

… that dog ate my shoe … … a young dog is called a puppy … … every cat ate too much … … when my cat was young she …

SLIDE 14

14

Language and the world

… that dog ate my shoe … … a young dog is called a puppy … … every cat ate too much … … when my cat was young she …

SLIDE 15

15

Language and the world

… that dog ate my shoe … … a young dog is called a puppy … … every cat ate too much … … when my cat was young she …

SLIDE 16

16

Language and the world

… that dog ate my shoe … … a young dog is called a puppy … … every cat ate too much … … when my cat was young she …

SLIDE 17

17

… that dog ate my shoe … … a young dog is called a puppy … … every cat ate too much … … when my cat was young she …

SLIDE 18

18

Distributional Semantics (DS)

… that dog ate my shoe … … a young dog is called a puppy … … every cat ate too much … … when my cat was young she …

SLIDE 19

19

Distributional Semantics (DS)

… that dog ate my shoe … … a young dog is called a puppy … … every cat ate too much … … when my cat was young she …

SLIDE 20

20

Distributional Semantics (DS)

… that dog ate my shoe … … a young dog is called a puppy … … every cat ate too much … … when my cat was young she …

dog cat

house flat

animal red

SLIDE 21

21

Westera & Boleda (2019, IWCS):

Distributional Semantics as a model of concepts?

dog cat

house flat

animal red

SLIDE 22

22

Westera & Boleda (2019, IWCS):

Distributional Semantics as a model of concepts?

The vectors of DS are abstractions over
ccurrences.

dog cat

house flat

animal red

SLIDE 23

23

Westera & Boleda (2019, IWCS):

Distributional Semantics as a model of concepts?

The vectors of DS are abstractions over
ccurrences.
And so are concepts (e.g., Piaget).

dog cat

house flat

animal red

SLIDE 24

24

Westera & Boleda (2019, IWCS):

Distributional Semantics as a model of concepts?

The vectors of DS are abstractions over
ccurrences.
And so are concepts (e.g., Piaget).

But what sort of concepts does DS model?

dog cat

house flat

animal red

SLIDE 25

25

Westera & Boleda (2019, IWCS):

Distributional Semantics as a model of concepts?

The vectors of DS are abstractions over
ccurrences.
And so are concepts (e.g., Piaget).

But what sort of concepts does DS model?

dog cat

house flat

animal red

SLIDE 26

26

Westera & Boleda (2019, IWCS):

Distributional Semantics as a model of concepts?

The vectors of DS are abstractions over
ccurrences.
And so are concepts (e.g., Piaget).

But what sort of concepts does DS model?

dog cat

house flat

animal red

SLIDE 27

27

Westera & Boleda (2019, IWCS):

Distributional Semantics as a model of concepts?

The vectors of DS are abstractions over
ccurrences.
And so are concepts (e.g., Piaget).

But what sort of concepts does DS model?

dog cat

house flat

animal red

“cat”

SLIDE 28

28

Westera & Boleda (2019, IWCS):

Distributional Semantics as a model of concepts?

The vectors of DS are abstractions over
ccurrences.
And so are concepts (e.g., Piaget).

But what sort of concepts does DS model?

dog cat

house flat

animal red

“cat”

SLIDE 29

29

Westera & Boleda (2019, IWCS):

Should Distributional Semantics account for entailment?

“cat”

SLIDE 30

30

Westera & Boleda (2019, IWCS):

Should Distributional Semantics account for entailment?

“cat”

SLIDE 31

31

Westera & Boleda (2019, IWCS):

Should Distributional Semantics account for entailment?

“cat”

“animal” “cat”

SLIDE 32

32

Westera & Boleda (2019, IWCS):

Should Distributional Semantics account for entailment?

“cat”

“animal” “cat”

SLIDE 33

33

Westera & Boleda (2019, IWCS):

Should Distributional Semantics account for entailment?

“cat”

“animal” “cat”

No .

SLIDE 34

34

“cat”

“animal” “cat”

SLIDE 35

35

Language and the world are not perfectly aligned

“cat”

“animal” “cat”

SLIDE 36

36

Language and the world are not perfectly aligned

“cat”

“animal” “cat”

~ ~

SLIDE 37

37

Language and the world are not perfectly aligned

SLIDE 38

38

Language and the world are not perfectly aligned

This is not (just) a technical challenge, but interesting.

SLIDE 39

39

Language and the world are not perfectly aligned

This is not (just) a technical challenge, but interesting.
Are some parts of language closer to the world than other parts?

Does this show in DS? Can we exploit this?

SLIDE 40

40

Language and the world are not perfectly aligned

This is not (just) a technical challenge, but interesting.
Are some parts of language closer to the world than other parts?

Does this show in DS? Can we exploit this?

S

me

e x p r e s s i

n

s a r e u s e d mo r e r i g i d l y t h a n

t

h e r s

. .

.

SLIDE 41

41

Language and the world are not perfectly aligned

This is not (just) a technical challenge, but interesting.
Are some parts of language closer to the world than other parts?

Does this show in DS? Can we exploit this?

S

me

e x p r e s s i

n

s a r e u s e d mo r e r i g i d l y t h a n

t

h e r s

. .

. ( K r i p k e , ‘ 8 )

SLIDE 42

42

Approach

SLIDE 43

43

Approach

Let’s compare two kinds of representations of category concepts:

SLIDE 44

44

Approach

Let’s compare two kinds of representations of category concepts:

– Predicate-based:

Word vector of a predicate that is used to denote the category.

SLIDE 45

45

Approach

Let’s compare two kinds of representations of category concepts:

– Predicate-based:

Word vector of a predicate that is used to denote the category.

– Name-based:

Centroid of the word vectors of names of instances of the category.

SLIDE 46

46

Approach

Let’s compare two kinds of representations of category concepts:

– Predicate-based:

Word vector of a predicate that is used to denote the category.

– Name-based:

Centroid of the word vectors of names of instances of the category.

E . g . , f

r

S c i e n t i s t , t h e w

r

d v e c t

r
f

“ s c i e n t i s t ”

SLIDE 47

47

Approach

Let’s compare two kinds of representations of category concepts:

– Predicate-based:

Word vector of a predicate that is used to denote the category.

– Name-based:

Centroid of the word vectors of names of instances of the category.

E . g . , f

r

S c i e n t i s t , t h e w

r

d v e c t

r
f

“ s c i e n t i s t ”

E . g . , t h e me a n

f

v e c t

r

s f

r

“ A l b e r t E i n s t e i n ” , “ E mmy No e t h e r ” , …

SLIDE 48

48

Approach

Let’s compare two kinds of representations of category concepts:

– Predicate-based:

Word vector of a predicate that is used to denote the category.

– Name-based:

Centroid of the word vectors of names of instances of the category.

Evaluation against human judgments of category relatedness.

E . g . , f

r

S c i e n t i s t , t h e w

r

d v e c t

r
f

“ s c i e n t i s t ”

E . g . , t h e me a n

f

v e c t

r

s f

r

“ A l b e r t E i n s t e i n ” , “ E mmy No e t h e r ” , …

SLIDE 49

Representing a concept by the distribution of names of its instances

Matthijs Westera, Gemma Boleda and Sebastian Padó

A b h i j e e t G u p t a &

SLIDE 50

50

Existing data/model we use

SLIDE 51

51

Existing data/model we use

The Instantiation dataset (Boleda, Gupta, and Padó, 2017, EACL):

– e.g., <Emmy Noether, scientist>, <Edinburgh, capital>

SLIDE 52

52

Existing data/model we use

The Instantiation dataset (Boleda, Gupta, and Padó, 2017, EACL):

– e.g., <Emmy Noether, scientist>, <Edinburgh, capital> – derived from WordNet’s ‘instance hyponym’ relation.

SLIDE 53

53

Existing data/model we use

The Instantiation dataset (Boleda, Gupta, and Padó, 2017, EACL):

– e.g., <Emmy Noether, scientist>, <Edinburgh, capital> – derived from WordNet’s ‘instance hyponym’ relation.

We focus on the 159 categories that have at least 5 entities.

SLIDE 54

54

Existing data/model we use

The Instantiation dataset (Boleda, Gupta, and Padó, 2017, EACL):

– e.g., <Emmy Noether, scientist>, <Edinburgh, capital> – derived from WordNet’s ‘instance hyponym’ relation.

We focus on the 159 categories that have at least 5 entities.
As DS representations of the entities’ names and categories’

predicates we use the Google News embeddings (Mikolov, Sutskever,

et al., 2013, ANIPS).

SLIDE 55

55

Evaluation: gathering human judgments

SLIDE 56

56

Evaluation: gathering human judgments

Following Bruni, Tran and Baroni’s MEN benchmark (2012, JAIR):

SLIDE 57

57

Evaluation: gathering human judgments

Following Bruni, Tran and Baroni’s MEN benchmark (2012, JAIR):

We semi-randomly sampled 1000 category pairs (out of 12.5K).

SLIDE 58

58

Evaluation: gathering human judgments

Following Bruni, Tran and Baroni’s MEN benchmark (2012, JAIR):

We semi-randomly sampled 1000 category pairs (out of 12.5K).
‘Comparative’ task: which pair of categories are more related

to each other?

SLIDE 59

59

Evaluation: gathering human judgments

Following Bruni, Tran and Baroni’s MEN benchmark (2012, JAIR):

We semi-randomly sampled 1000 category pairs (out of 12.5K).
‘Comparative’ task: which pair of categories are more related

to each other?

Also same way of computing aggregated ‘relatedness’ scores.

SLIDE 60

60

Crowdsource task

SLIDE 61

61

Main result

SLIDE 62

62

Main result

Spearman (ranking) correlations between:

SLIDE 63

63

Main result

Spearman (ranking) correlations between:

– cosine similarities from Name-based / Predicate-based

and

– aggregate scores from our human judgments

SLIDE 64

64

Main result

Spearman (ranking) correlations between:

– cosine similarities from Name-based / Predicate-based

and

– aggregate scores from our human judgments

Result:

– Predicate-based: 0.56

SLIDE 65

65

Main result

Spearman (ranking) correlations between:

– cosine similarities from Name-based / Predicate-based

and

– aggregate scores from our human judgments

Result:

– Predicate-based: 0.56 – Name-based: 0.74

SLIDE 66

66

Artist’s impression

SLIDE 67

67

Artist’s impression

SLIDE 68

68

How many names do we need?

SLIDE 69

69

How many names do we need?

SLIDE 70

70

How many names do we need?

S u r p r i s i n g l y f e w !

SLIDE 71

71

Entities need to be representative

SLIDE 72

72

Entities need to be representative

E.g., the Name-based model overestimates surgeon ~ siege...

SLIDE 73

73

Entities need to be representative

E.g., the Name-based model overestimates surgeon ~ siege...
Instances of surgeon in the Instantiation dataset:

– William Cowper – James Parkinson – Alexis Carrel – Walter Reed – William Beaumont – Joseph Lister

SLIDE 74

74

Entities need to be representative

E.g., the Name-based model overestimates surgeon ~ siege...
Instances of surgeon in the Instantiation dataset:

– William Cowper – James Parkinson – Alexis Carrel – Walter Reed – William Beaumont – Joseph Lister

I n v

l

v e d i n WW1

SLIDE 75

75

Entities need to be representative

E.g., the Name-based model overestimates surgeon ~ siege...
Instances of surgeon in the Instantiation dataset:

– William Cowper – James Parkinson – Alexis Carrel – Walter Reed – William Beaumont – Joseph Lister

I n v

l

v e d i n WW1 M e mb e r s

f

U S mi l i t a r y c

r

p s

SLIDE 76

76

Entities need to be representative

E.g., the Name-based model overestimates surgeon ~ siege...
Instances of surgeon in the Instantiation dataset:

– William Cowper – James Parkinson – Alexis Carrel – Walter Reed – William Beaumont – Joseph Lister

Wr

t

e “ t h e s i e g e

f

c h e s t e r ” ( ? ) I n v

l

v e d i n WW1 M e mb e r s

f

U S mi l i t a r y c

r

p s

SLIDE 77

77

Discussion

SLIDE 78

78

Discussion

SLIDE 79

79

Discussion

Main finding:

SLIDE 80

80

Discussion

Main finding:

– Name-based representations of category concepts align better

with ‘the world’ than Predicate-based representations.

SLIDE 81

81

Discussion

Main finding:

– Name-based representations of category concepts align better

with ‘the world’ than Predicate-based representations.

– Even a small number of (representative) names can be enough.

SLIDE 82

82

Discussion

Main finding:

– Name-based representations of category concepts align better

with ‘the world’ than Predicate-based representations.

– Even a small number of (representative) names can be enough.

Outlook:

– Not every category has named instances...

SLIDE 83

83

Discussion

Main finding:

– Name-based representations of category concepts align better

with ‘the world’ than Predicate-based representations.

– Even a small number of (representative) names can be enough.

Outlook:

– Not every category has named instances... – NLP relevance? Vs. sense disambiguation? Contextualized word

embeddings (ELMo, BERT, …)?

SLIDE 84

84

Discussion

Main finding:

– Name-based representations of category concepts align better

with ‘the world’ than Predicate-based representations.

– Even a small number of (representative) names can be enough.

Outlook:

– Not every category has named instances... – NLP relevance? Vs. sense disambiguation? Contextualized word

embeddings (ELMo, BERT, …)?

– Cognitive relevance? E.g., prototype theory?

SLIDE 85

85

Acknowledgments

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 715154). This paper reflects the authors’ view only, and the EU is not responsible for any use that may be made of the information it contains.

SLIDE 86

86

Image sources

https://ui-ex.com/explore/whale-transparent-dark/ https://commons.wikimedia.org/wiki/File:Cowicon.svg https://commons.wikimedia.org/wiki/File:Bird_1010720_drawing.svg https://commons.wikimedia.org/wiki/File:Dog_silhouette.svg https://commons.wikimedia.org/wiki/File:Cat_silhouette_darkgray.svg https://commons.wikimedia.org/wiki/File:Frog_(example).svg https://commons.wikimedia.org/wiki/File:PeregrineFalconSilhouettes.svg https://commons.wikimedia.org/wiki/File:Common_goldfish_silhouette.svg https://commons.wikimedia.org/wiki/File:Six_weeks_old_cat_(aka).jpg https://nl.m.wikipedia.org/wiki/Bestand:Kooikerhondje_puppy.jpg https://nl.m.wikipedia.org/wiki/Bestand:Golden_Retriever_eating_crust_of_pizza.jpg https://commons.wikimedia.org/wiki/File:Cat-eating-prey.jpg

SLIDE 87

87

Where are predicates and names, anyway?

predicate name

SLIDE 88

88

Where are predicates and names, anyway?

predicate name

SLIDE 89

89

Crowdsource task

SLIDE 90

90

Crowdsource task

SLIDE 91

91

Crowdsource task instructions

SLIDE 92

92

Crowdsource task instructions

SLIDE 93

93

Why definitions?

SLIDE 94

94

Why definitions?

The same words can often be used to denote various categories.

SLIDE 95

95

Why definitions?

The same words can often be used to denote various categories.
To properly evaluate the Name-based approach, the human

judgments should be about the categories as intended by the Instantiation dataset we use.

SLIDE 96

96

Why definitions?

The same words can often be used to denote various categories.
To properly evaluate the Name-based approach, the human

judgments should be about the categories as intended by the Instantiation dataset we use.

(Would be good practice more generally – e.g., vs. the good

subject effect.)

SLIDE 97

97

Why definitions?

The same words can often be used to denote various categories.
To properly evaluate the Name-based approach, the human

judgments should be about the categories as intended by the Instantiation dataset we use.

(Would be good practice more generally – e.g., vs. the good

subject effect.)

This may give the Predicate-based approach a disadvantage…

SLIDE 98

98

Why definitions?

The same words can often be used to denote various categories.
To properly evaluate the Name-based approach, the human

judgments should be about the categories as intended by the Instantiation dataset we use.

(Would be good practice more generally – e.g., vs. the good

subject effect.)

This may give the Predicate-based approach a disadvantage…

– but this disadvantage is not an unfair one.

SLIDE 99

99

A closer look per ontological domain

SLIDE 100

100

A closer look per ontological domain Predicate

based:

SLIDE 101

101

A closer look per ontological domain Predicate

based:

SLIDE 102

102

A closer look per ontological domain Predicate

based:

Name- based:

SLIDE 103

103

A closer look per ontological domain Predicate

based:

Name- based:

SLIDE 104

104

Non-representative instances of ‘object’ categories

SLIDE 105

105

A closer look per ontological domain Predicate

based:

Name- based:

SLIDE 106

106

A closer look per ontological domain Predicate

based:

Name- based:

.54 .64