[PPT] - The big question: How do we infer and reason about meanings of PowerPoint Presentation

SLIDE 1

The big question: How do we infer and reason about meanings of sentences? Conceptual importance: Discovering the process of cognition and intelligence. Applications: Automating language-related tasks, such as document search.

SLIDE 2

The big challenge: Meaning of a sentence = Collection of meanings of its words. John likes Mary = {John, likes, Mary}

sentence

S

=

word 1 word 2 word n

. . .

A B Z

Meaning of a sentence = A function of meanings of its words.

word 1 word 2 word n

. . .

process depending on grammatical structure sentence

=

A B Z S S

SLIDE 3

SLIDE 4

Two complementary approaches to meaning 1- The logical or symbolic model

  

Meaning of sentence = A truth function of its words. words = ∅ . 2- The vector space or distributional model

  

Words = Vectors built from context , function = ∅ .

word 1 word 2 word n

. . .

A B Z

SLIDE 5

Logical vs Vector Space Models (I) Logical Models

                            

Pros:

      

Compositional, Model-theoretic semantics (Montague), Automated inferences. Cons:

            

Qualitative (true-false), Not very suitable for real world text, Says very little about lexical semantics, Forgets some of the syntactic structure. (II) Vector Space Model

              

Cons: Non-compositional. Pros:

  

Quantitative, All about lexical semantics.

SLIDE 6

A formalism with the best of the two: Compositional & Distributional Meaning of a sentence = A function of the − − − − − → vectors of its words.

word 1 word 2 word n

. . .

process depending on grammatical structure sentence

=

A B Z S S

SLIDE 7

Compositional Distributional Models of Meaning Clark, Coecke, Grefenstette, Pulman, Sadrzadeh Computing and Computer Laboratories Oxford and Cambridge

SLIDE 8

Aim: Understanding this model. Theoretical Preliminaries 0- Some Category Theory 1- Pregroup Grammars 2- Vector Space Models 3- Pregroups and Vector Spaces Categorically 5- Combining the two: Categorical Semantics for Compositional Distributional Models Ed- Concrete: Implementation, Evaluation, Experiments.

SLIDE 9

Some Category Theory A category has

Objects:

A, B, C

Morphisms:

f, g, h A

f ✲ B

and B

g ✲ C

The morphisms must compose:

If A

f ✲ B and B g ✲ C then ∃h, A h ✲ C such that h = f; g .

Each object has an identity morphism

A 1A✲ A B

1B✲ B

This is the unit of composition, i.e. for A

f ✲ B we have

1A; f = f; 1B = f

SLIDE 10

Example Objects Morphisms systems processes sets relations sets functions formulas proofs grammatical types grammatical reductions vector spaces linear maps

SLIDE 11

Sets and Relations Objects: sets A = {x, y} B = {z, w} C = {s, t} Morphisms: Relations A

f ✲ B

is defined by f ⊆ {(a, b) | a ∈ A, b ∈ B} For instance A

f ✲ B

given by f = {(x, z), (x, w), (y, z)} B

g ✲ C

given by g = {(z, s), (w, s)}

SLIDE 12

Sets and Relations Composition: Composing Relations A

f ✲ B g ✲ C

∃h, A

h ✲ C

such that h = f; g In general f; g =

(a, c) | ∃b, (a, b) ∈ f & (b, c) ∈ g
For instance in our example

f; g =?

SLIDE 13

Sets and Relations Identity: Diagonal Relation 1A = {(a, a) | a ∈ A} For our example 1A = {(x, x), (y, y)} 1B = {(z, z), (w, w)} These must satisfy 1A; f = f; 1B = f For instance compute 1A; f = {(x, x), (y, y)}; {(x, z), (x, w), (y, z)} and verify that it is = f

SLIDE 14

Monoidal Category A category with a binary operation called tensor and denoted by ⊗. This operator acts on two objects and returns their composite A ⊗ B It also acts on morphisms and turns them parallel If A

f ✲

Q B

g ✲

W

  

then A ⊗ B f⊗g

✲ Q ⊗ W

The tensor has a unit I, that is A ⊗ I = I ⊗ A = A

SLIDE 15

Sets and Relations There is more than one ⊗ here, but for our purposes, given two sets A, B, we take their tensor product to be cartesian product A ⊗ B = {(a, b) | a ∈ A, b ∈ B} For our previous example we have A ⊗ B = {(x, z), (x, w), (y, z), (y, w)} The unit is the singleton set I = {∗} A ⊗ I = A × I = {(a, ∗) | a ∈ A} ∼ = {a | a ∈ A} = A Tensor on morphisms is cartesian product of relations.

SLIDE 16

Diagrammatic Calculus The objects and morphisms of a monoidal category are usually depicted as follows 1A f g; f 1A ⊗ 1B f ⊗ 1C f ⊗ g (f ⊗ g); h

f

B A

g

D C

f

B A C B

f

B A A

h

B E A B D C

g

E A

f f g

C

SLIDE 17

Diagrammatic Calculus The elements within the objects (e.g. elements of a set) can be depicted using the unit I as follows: ψ : I → A π : A → I π ◦ ψ : I → I

ψ

A A

π ψ

A

π π ψ

=

For instance the morphism I → A can be element x of A = {x, y}. x: I → {x, y}

SLIDE 18

Compact Category A monoidal category where each object A has a left adjoint Al and a right adjoint Ar. This means that for each object A, we have 4 morphisms in the category: ǫl : Al⊗ A → I ηl : I → A ⊗ Al ǫr : A ⊗ Ar → I ηr : I → Ar⊗ A Diagrammatically, these morphisms are depicted by: A Al A A A A

l r

A Ar

SLIDE 19

Compact Category These morphisms should satisfy: (ηl ⊗ 1A); (1A ⊗ ǫl) = 1A (1A ⊗ ηr); (ǫr ⊗ 1A) = 1A (1Al ⊗ ηl); (ǫl ⊗ 1Al) = 1A (ηr ⊗ 1Ar); (1Ar ⊗ ǫr) = 1A Diagrammatically, these are depicted by:

=

A A A A

=

A A A A

l l l l

=

A A A A

=

A A A A

r r r r

SLIDE 20

Pregroups (P, ≤, •, I, (−)l, (−)r) ∀p ∈ P, ∃pr ∈ P, ∃pl ∈ P pl • p ≤ I ≤ p • pl p • pr ≤ I ≤ pr • p Adjoint are unique and anti-tone p ≤ q = ⇒ ql ≤ pl, qr ≤ pr Unit is self adjoint Il = Ir = I So is multiplication (p • q)l = ql • pl (p • q)r = qr • pr Same adjoint do not cancel out (pr)r = p = (pl)l But opposite adjoints do (pl)r = p = (pr)l

SLIDE 21

Example of a Proof: adjoints are unique. Suppose p has another left adjoint, call it x. This means x • p ≤ I ≤ p • x Now we have x = x • I ≤ x • p • pl = x • p • pl ≤ I • pl = pl Hence x ≤ pl Similarly pl = pl • I ≤ pl • p • x = pl • p • x ≤ I • x = x Hence pl ≤ x

SLIDE 22

Example of a Proof

is self-dual

We want to show the following (also for the right adjoint) (p • q)l = ql • pl Compute (ql • pl) • (p • q) = ql • (pl • p) • q ≤ ql • 1 • q = ql • q ≤ I Also (p • q) • (ql • pl) = p • (q • ql) • pl ≥ p • 1 • pl = p • pl ≥ I Hence we have (ql • pl) • (p • q) ≤ I ≤ (p • q) • (ql • pl) So ql • pl is the left adjoint to p • q, but so is (p • q)l. Since adjoints are unique, we get ql • pl = (p • q)l

SLIDE 23

Examples of a Pregroup (0) A pregroup in which pl = pr = p−1 is a (po)-group. (1) The set of all unbounded monotone functions on integers. f : Z → Z m ≤ n = ⇒ f(m) ≤ f(n) and m → ∞ = ⇒ f(m) → ∞ The order is defined pointwisely f ≤ g iff f(n) ≤ g(n) ∀n ∈ Z The • is function composition and its unit is the identity (f • g)(n) = f(g(n)) and I(n) = n Adjoints are defined canonically, ∨ is max, ∧ is min fr(x) = ∨{y ∈ Z | f(y) ≤ x} fl(x) = ∧{y ∈ Z | x ≤ f(y)}

SLIDE 24

Example 1) Take f(x) = 2x. Define adjoints as follows: fr(x) = ∨{y ∈ Z | 2y ≤ x} fl(x) = ∧{y ∈ Z | x ≤ 2y} fr(x) = ⌊x/2⌋ and fl(x) = ⌊(x + 1)/2⌋ where ⌊x⌋ is the biggest integer less than or equal to x. 2) Restrict to N and a nice example is π(x) = the x’th prime πr(x) =? π(5) = 11 πr(5) = 3

SLIDE 25

Application to Linguistics Let Σ be the set of words of a natural language and B their types.

Def. A Pregroup dictionary for Σ based on B is a binary relation

D ⊆ Σ × T(B) where T(B) is the free pregroup generated over the partial order B.

Def. A Pregroup grammar is a pair

G = D, s

f a pregroup dictionary and a distinguished element s ∈ B.
Def. A string of words w1 . . . wn of Σ is a grammatical sentence

if and only if t1 • · · · • tn ≤ s for (wi, ti) an element in D.

SLIDE 26

Example A simple dictionary has basic types B = {π, o, w, s, q, q, j, σ} π, o, w stand for subject, direct object, indirect object, s, j stand for statement, infinitive of a verb, q, q stand for yes-no and wh questions, σ is an index type. Partial order π ≤ n,

≤ n .

Dictionary John: π likes: πrsol does: πrsjlσ Mary : o like: σrjπl not: σrjjlσ

SLIDE 27

Examples Compose the types of the constituents John likes Mary → statement π (πrs ol)

≤

s Compute ππrsolo ≤ 1solo ≤ 1s1 = s John does not like Mary → statement π (πrsjlσ) (σrjjlσ) (σrjol)

≤

s Compute: ππrsjlσσrjjlσσrjolo ≤ 1sjl1jjl1j1 = sjljjlj ≤ s11 = s Can you think of a simpler way to compute the above?

SLIDE 28

Depicting the Reduction Each reduction corresponds to a diagram. John likes Mary π πr s ol

John does not like Mary

π πrsjlσ σrjjlσ σrjol

SLIDE 29

Di-transitive Sentences John gave Mary an apple. = ⇒ statement π (πr s wl ol)

w

→ s Adverbs John saw Mary yesterday. = ⇒ statement π (πr s ol)

(srs)

→ s Yes-no questions Does John like Mary? = ⇒ question (q ilπl) π (i ol)

→

q Wh-questions Who does John like? = ⇒ question (q ollql) qilπl π (i ol) → q

SLIDE 30

A Pregroup forms a Compact Closed Category. 1- Elements of pregroup are objects p, q ∈ P . 2- Partial order is the morphism p → q iff p ≤ q . 3- Tensor is monoid multiplication p ⊗ q = p • q = pq, unit is 1 . 4- Adjoints are adjoints pl, pr. 5- Epsilon maps cancel out types: ǫr = ppr ≤ 1 ǫl = plp ≤ 1 6- Eta maps are generate types: ηr = 1 ≤ prp ηl = 1 ≤ ppl

SLIDE 31

The reduction of types become morphisms of the category: John likes Mary π πrs ol

ǫr

π

⊗ 1s ⊗ ǫl

: ππrsolo → s

SLIDE 32

The reduction of types become morphisms of the category: John π does πrsjlσ not σrjjlσ like σrjol Mary

ǫr

π ⊗ 1sjl ⊗ ǫr σ ⊗ 1jjl ⊗ ǫr σ ⊗ 1j ⊗ ǫl

;
1s ⊗ ǫl

j ⊗ ǫl j

: ππrsjlσσrjjlσσrjolo → s

SLIDE 33

Great! We have our glue now. But we did not say anything about meanings of words. What do ‘John’, ‘Mary’ and ‘like’ mean? Sets of humans and actions? Problem: ?

SLIDE 34

Distributional Model of Meaning Firth: ”You shall know a word by the company it keeps”. Intuition: The meanings of cat and dog are similar (in some way) because they are both pets, often furry and are frequently stroked. These facts are reflected in text: cat and dog both appear close to the words pet, furry, stroke. In the same way, there is a similarity between the words lion and tiger, also between ship and boat, etc.

SLIDE 35

Vector Spaces Build a highly dimensional vector spaces, whose basis are in prin- ciple all the words of a language. Fix a window of n-words, form vectors for words you want to learn from the frequency of their co-

ccurence with each base in the window.

Application: Automatic Thesaurus Construction, Curran (2003): From Distributional to Semantic Similarity, created context vectors from 2 billion words of text, compared context vectors to find pairs

f synonyms.

Example: Introduction: launch, implementation, advent, addition, adoption, ar- rival, absence, inclusion, creation Methods: technique, procedure, means, approach, strategy, tool, concept, practice, formula, tactic

SLIDE 36

Linear Algebra Vectors and Tensors Any vector can be written as a weighted sum of basis vectors − → a = C1− → v1 + C2− → v2 + · · · + Cn− → vn =

i

Ci− → vi For Ci ∈ R a weight, (− → v1, . . . , − → vn) a basis of A, and − → a ∈ A. The tensor A ⊗ B has as a basis the cartesian product of a basis of A with a basis of B, so a typical vector − → c ∈ A ⊗ B is written

ij

Cij (− → vi ⊗ − → v′

j)

=

ij

Cij (− → vi, − → v′

j)

For (− → v1, . . . , − → vn) a basis of A and ( − → v′

1, . . . ,

− → v′

n) a basis of B.

SLIDE 37

Recalling Linear Algebra Entangled Vectors and Inner Product In general − → c ∈ A⊗B cannot be written as the tensor of two vectors, except for the case when − → c is not entangled. In which case, it can and we obtain − → c = − → a ⊗ − → b =

ij

Ci × C′

j (−

→ vi ⊗ − → v′

j)

for − → a =

i Ci−

→ vi and − → b =

j C′ j

− → v′

j .

The separable ⊗ is referred to as the Kronecker Product. The inner product of two vectors is denoted and defined by − → a | − → b =

i

Ci × C′

i

∈ R

SLIDE 38

Examples Vector Space A: pet stroke furry dog cat Basis of A is {− − → furry, − → pet, − − − → stroke}. − → dog = 2 × − − → furry + 5 × − → pet + 7 × − − − → stroke − → cat = 3 × − − → furry + 3 × − → pet + 8 × − − − → stroke − → dog ⊗ − → cat = 2 × 3(− − → furry ⊗ − − → furry) + 2 × 3(− − → furry ⊗ − → pet) + 2 × 8(− − − → stroke ⊗ − − → furry) + · · · But A ⊗ A has much more elements in it. − → dog | − → cat = 2 × 3 + 5 × 3 + 7 × 8 Cos(− → dog, − → cat) = − → dog | − → cat | − → dog | × | − → cat |

SLIDE 39

FVector Spaces form a Compact Closed Category. 1- Vector spaces are objects V, W 2- Linear maps are morphisms f : V → W 3- Tensor is tensor V ⊗ W, unit is R. 4- Adjoints are identity V l = V ∗ = V r 5- Epsilon maps are inner products ǫl = ǫr : V ∗ ⊗ V → R

ij

Cij (− → vi ⊗ − → vj) →

ij

Cij− → vi|− → vj =

i

Cii 6- Eta maps create entangled states ηl = ηr :

R → V ⊗ V ∗

1 →

i

(− → vi ⊗ − → vi)

SLIDE 40

Great! We have our meanings of words now, but no glue! How can we form a vector for the meaning of a sentence? Summation: − − − − − − − − − − → John likes Mary = − − − → likes + − − − → John + − − − → Mary Multiplication: − − − − − − − − − − → John likes Mary = − − − → likes × − − − → John × − − − → Mary Problem: ? Can you suggest a solution?

SLIDE 41

Combining Vector Spaces and Pregroups Categorical Semantics FV ect × Pregroup Objects: (− → w ∈ W, p) Morphisms: (f : V → W, α) Syntax/Semantics dictionary

To each word wi, assign an object

(− → wi ∈ Wi, pi)

To each sentence w1 · · · wn, with Pregroup reduction

p1 · · · pn

α ✲ s

assign an object too (− − − − − − → w1 · · · wn ∈ S, s)

SLIDE 42

Compositional Meaning for Sentences Def: Com. Dist Meaning. Given a sentence w1 · · · wn define its meaning as − − − − − − → w1 · · · wn := f(− → w1 ⊗ · · · ⊗ − → wn) The linear map f : W1 ⊗ · · · ⊗ Wn

✲ S

is the Pregroup reduction of the sentence p1 · · · pn

α ✲ s

expressed in FV ect as follows [α](pi/Wi)

SLIDE 43

We have both our glue and something to glue with! glue = pregroup grammatical structure. thing to glue = meanings vectors of words

−

− − → John, − − − → likes, − − → beer

The diagram can be written as a map f standing for the grammatical

structure, yet expressible in the language of vector spaces − − − − − − − − − − − → John likes beer := f(− − − → John, − − − → likes, − − → beer)

SLIDE 44

Positive Transitive Sentence ”John likes beer” Syntax/Semantics Dictionary: (− − − → John ∈ V, π) (− − − → likes ∈ V ⊗ S ⊗ W, πrsol) (− − → beer ∈ W, o) Grammar: α = ǫr

π ⊗ 1s ⊗ ǫl

=

⇒ f = ǫr

V ⊗ 1S ⊗ ǫl W

Meaning: − − − − − − − − − − − → John likes beer =

ǫr

V ⊗ 1S ⊗ ǫl W

−

− − → John ⊗ − − − → likes ⊗ − − → beer

v

w Ψ

SLIDE 45

Towards a concrete setting − − − → likes is a non-separable vector in V ⊗ S ⊗ W − − − → likes =

ikj

Cikj (− → v i ⊗ − → s k ⊗ − → w j) for − → v i, − → w j, − → s k basis of V, W, S. Meaning of the sentence becomes equal to

ǫr

V ⊗ 1S ⊗ ǫl W



−

− − → John ⊗

 

ikj

Cikj(− → v i ⊗ − → s k ⊗ − → w j)

  ⊗ −

− → beer

 

=

k

 

ij

Cikj− − − → John|− → v i− → w j|− − → beer

  −

→ s k Use the space

k Cikj−

→ s k to define concrete meaning.

SLIDE 46

Example: Boolean Meaning V has basis {− → mi}i encoding all men. Assume John is m3. W has basis {− → dj}j encoding all drinks. Assume beer is d4. S is the two dimensional space with basis {− − → true, − − → false}. Define:

k

Cikj− → sk =: − → s ij =

  

− − → true mi likes dj − − → false

.w.

Compute

ij

−

→ m3 | − → mi

⊗ −

→ s ij⊗

−

→ d j | − → d4

=
ij

δ3i − → s ij δj4 = − → s 34 = true if ”John likes beer” and false

therwise.

SLIDE 47

Example: Negative Transitive Sentence ”John does not like beer”

v w Ψ

does not

For does and not we have: − − → does = − → not =

not

Since f =

f

V W

= ⇒ Ψf = f

V W V

Substitute these in the diagram above and we obtain

SLIDE 48

v w Ψ

does not

=

v w Ψ

not

=

v w Ψ v w Ψ

=

not not

=

SLIDE 49

Now this is equivalent to

ǫV ⊗
1

1 1 1

⊗ ǫW
(−

→ v ⊗ − → Ψ ⊗ − → w ) . In the Boolean example above, that is

1

1 1 1

−

→ s 34 =

        

− − → false − → s 34 = − − → true − − → true − → s 34 = − − → false

SLIDE 50

Example: Negative Transitive Sentence ”John does not like beer” Syntax/Semantics Dictionary: For John and beer as before. (− − → does ∈ V ⊗ S ⊗ J ⊗ V, πrsjlσ) (− → not ∈ V ⊗ J ⊗ J ⊗ V, σrjjlσ) (− − → like ∈ V ⊗ J ⊗ W, σrjol) Logical words: − → not =

k

− → mk ⊗ (|10 + |01) ⊗ − → mk − − → does =

l

− → ml ⊗ (|11 + |00) ⊗ − → ml

SLIDE 51

Grammar: α =

1s ⊗ ǫl

j ⊗ ǫl j

ǫr

π ⊗ 1sjl ⊗ ǫr σ ⊗ 1jjl ⊗ ǫr σ ⊗ 1j ⊗ ǫl

Meaning:

f =

1S ⊗ ǫl

J ⊗ ǫl J

ǫr

V ⊗ 1SJ ⊗ ǫr V ⊗ 1JJ ⊗ ǫr V ⊗ 1J ⊗ ǫl W

−

− − − − − − − − − − − − − − − − → John does not like beer = (f)

−

− − → John ⊗ − − → does ⊗ − → not ⊗ − − → like ⊗ − − → beer

SLIDE 52

(f ◦ g)

−

→ m3 ⊗

l −

→ ml ⊗ does ⊗ − → ml

⊗

k −

→ mk ⊗ not ⊗ − → mk

⊗

ij −

→ mi ⊗ − → s ij ⊗ − → d j

⊗ −

→ d 4

=

l−

→ m3 | − → ml ⊗ does ⊗ − → ml

⊗

k −

→ mk ⊗ not ⊗ − → mk

⊗

ij −

→ mi ⊗ − → s ij ⊗ − → d j | − → d 4

l δ3l ⊗ does ⊗ −

→ ml

⊗

k −

→ mk ⊗ not ⊗ − → mk

⊗

ij −

→ mi ⊗ − → s ij ⊗ δj4

does ⊗ −

→ m3 ⊗

k −

→ mk ⊗ not ⊗ − → mk

⊗

i −

→ mi ⊗ − → s i4

does ⊗

k−

→ m3 | − → mk ⊗ not ⊗ − → mk

⊗

i −

→ mi ⊗ − → s i4

does ⊗

k δ3k ⊗ not ⊗ −

→ mk

⊗

i −

→ mi ⊗ − → s i4

does ⊗ not ⊗ −

→ m3 ⊗

i −

→ mi ⊗ − → s i4

does ⊗ not ⊗

i−

→ m3 | − → mi ⊗ − → s i4

does ⊗ not ⊗

i δ3i ⊗ −

→ s i4

SLIDE 53

−

→ m3 ⊗

l −

→ ml ⊗ does ⊗ − → ml

⊗

k −

→ mk ⊗ not ⊗ − → mk

⊗

ij −

→ mi ⊗ − → s ij ⊗ − → d j

⊗ −

→ d 4

l−

→ m3 | − → ml ⊗ does ⊗ − → ml

⊗

k −

→ mk ⊗ not ⊗ − → mk

⊗

ij −

→ mi ⊗ − → s ij ⊗ − → d j | − → d 4

l δ3l ⊗ does ⊗ −

→ ml

⊗

k −

→ mk ⊗ not ⊗ − → mk

⊗

ij −

→ mi ⊗ − → s ij ⊗ δj4

does ⊗ −

→ m3 ⊗

k −

→ mk ⊗ not ⊗ − → mk

⊗

i −

→ mi ⊗ − → s i4

does ⊗

k−

→ m3 | − → mk ⊗ not ⊗ − → mk

⊗

i −

→ mi ⊗ − → s i4

does ⊗

k δ3k ⊗ not ⊗ −

→ mk

⊗

i −

→ mi ⊗ − → s i4

does ⊗ not ⊗ −

→ m3 ⊗

i −

→ mi ⊗ − → s i4

does ⊗ not ⊗

i−

→ m3 | − → mi ⊗ − → s i4

does ⊗ not ⊗

i δ3i ⊗ −

→ s i4

does ⊗ not ⊗ −

→ s 34

SLIDE 54

does ⊗ not ⊗ − → s 34 = (|00 + |11) ⊗ (|10 + |01) ⊗ − → s 34 = |0 01

0−

→ s 34

+|0 00 1−

→ s 34

+|1 11 0−

→ s 34

+|1 10 1−

→ s 34 = |0 1− → s 34

+ |1 0−

→ s 34 =

        

|0 11

+ |1 01

−

→ s 34 = − − → true |0 10

+ |1 00

−

→ s 34 = − − → false =

        

− − → false − → s 34 = − − → true − − → true − → s 34 = − − → false

SLIDE 55

Needs more work We have a good case for meanings of

verbs
nouns
adjectives
adverbs

We do not yet fully understand meanings of

quantifiers
relative clauses
conjunctives.

We have a parser for pregroups. Remains to add to it the vector space part.