The big question: How do we infer and reason about meanings of sentences? Conceptual importance: Discovering the process of cognition and intelligence. Applications: Automating language-related tasks, such as document search.
The big question: How do we infer and reason about meanings of - - PowerPoint PPT Presentation
The big question: How do we infer and reason about meanings of - - PowerPoint PPT Presentation
The big question: How do we infer and reason about meanings of sentences? Conceptual importance: Discovering the process of cognition and intelligence. Applications: Automating language-related tasks, such as document search. The big
The big challenge: Meaning of a sentence = Collection of meanings of its words. John likes Mary = {John, likes, Mary}
sentence
S
=
word 1 word 2 word n
. . .
A B Z
Meaning of a sentence = A function of meanings of its words.
word 1 word 2 word n
. . .
process depending on grammatical structure sentence
=
A B Z S S
Two complementary approaches to meaning 1- The logical or symbolic model
Meaning of sentence = A truth function of its words. words = ∅ . 2- The vector space or distributional model
Words = Vectors built from context , function = ∅ .
word 1 word 2 word n
. . .
A B Z
Logical vs Vector Space Models (I) Logical Models
Pros:
Compositional, Model-theoretic semantics (Montague), Automated inferences. Cons:
Qualitative (true-false), Not very suitable for real world text, Says very little about lexical semantics, Forgets some of the syntactic structure. (II) Vector Space Model
Cons: Non-compositional. Pros:
Quantitative, All about lexical semantics.
A formalism with the best of the two: Compositional & Distributional Meaning of a sentence = A function of the − − − − − → vectors of its words.
word 1 word 2 word n
. . .
process depending on grammatical structure sentence
=
A B Z S S
Compositional Distributional Models of Meaning Clark, Coecke, Grefenstette, Pulman, Sadrzadeh Computing and Computer Laboratories Oxford and Cambridge
Aim: Understanding this model. Theoretical Preliminaries 0- Some Category Theory 1- Pregroup Grammars 2- Vector Space Models 3- Pregroups and Vector Spaces Categorically 5- Combining the two: Categorical Semantics for Compositional Distributional Models Ed- Concrete: Implementation, Evaluation, Experiments.
Some Category Theory A category has
- Objects:
A, B, C
- Morphisms:
f, g, h A
f ✲ B
and B
g ✲ C
- The morphisms must compose:
If A
f ✲ B and B g ✲ C then ∃h, A h ✲ C such that h = f; g .
- Each object has an identity morphism
A 1A✲ A B
1B✲ B
This is the unit of composition, i.e. for A
f ✲ B we have
1A; f = f; 1B = f
Example Objects Morphisms systems processes sets relations sets functions formulas proofs grammatical types grammatical reductions vector spaces linear maps
Sets and Relations Objects: sets A = {x, y} B = {z, w} C = {s, t} Morphisms: Relations A
f ✲ B
is defined by f ⊆ {(a, b) | a ∈ A, b ∈ B} For instance A
f ✲ B
given by f = {(x, z), (x, w), (y, z)} B
g ✲ C
given by g = {(z, s), (w, s)}
Sets and Relations Composition: Composing Relations A
f ✲ B g ✲ C
∃h, A
h ✲ C
such that h = f; g In general f; g =
- (a, c) | ∃b, (a, b) ∈ f & (b, c) ∈ g
- For instance in our example
f; g =?
Sets and Relations Identity: Diagonal Relation 1A = {(a, a) | a ∈ A} For our example 1A = {(x, x), (y, y)} 1B = {(z, z), (w, w)} These must satisfy 1A; f = f; 1B = f For instance compute 1A; f = {(x, x), (y, y)}; {(x, z), (x, w), (y, z)} and verify that it is = f
Monoidal Category A category with a binary operation called tensor and denoted by ⊗. This operator acts on two objects and returns their composite A ⊗ B It also acts on morphisms and turns them parallel If A
f ✲
Q B
g ✲
W
then A ⊗ B f⊗g
✲ Q ⊗ W
The tensor has a unit I, that is A ⊗ I = I ⊗ A = A
Sets and Relations There is more than one ⊗ here, but for our purposes, given two sets A, B, we take their tensor product to be cartesian product A ⊗ B = {(a, b) | a ∈ A, b ∈ B} For our previous example we have A ⊗ B = {(x, z), (x, w), (y, z), (y, w)} The unit is the singleton set I = {∗} A ⊗ I = A × I = {(a, ∗) | a ∈ A} ∼ = {a | a ∈ A} = A Tensor on morphisms is cartesian product of relations.
Diagrammatic Calculus The objects and morphisms of a monoidal category are usually de- picted as follows 1A f g; f 1A ⊗ 1B f ⊗ 1C f ⊗ g (f ⊗ g); h
f
B A
g
D C
f
B A C B
f
B A A
h
B E A B D C
g
E A
f f g
C
Diagrammatic Calculus The elements within the objects (e.g. elements of a set) can be depicted using the unit I as follows: ψ : I → A π : A → I π ◦ ψ : I → I
ψ
A A
π ψ
A
π π ψ
- =
For instance the morphism I → A can be element x of A = {x, y}. x: I → {x, y}
Compact Category A monoidal category where each object A has a left adjoint Al and a right adjoint Ar. This means that for each object A, we have 4 morphisms in the category: ǫl : Al⊗ A → I ηl : I → A ⊗ Al ǫr : A ⊗ Ar → I ηr : I → Ar⊗ A Diagrammatically, these morphisms are depicted by: A Al A A A A
l r
A Ar
Compact Category These morphisms should satisfy: (ηl ⊗ 1A); (1A ⊗ ǫl) = 1A (1A ⊗ ηr); (ǫr ⊗ 1A) = 1A (1Al ⊗ ηl); (ǫl ⊗ 1Al) = 1A (ηr ⊗ 1Ar); (1Ar ⊗ ǫr) = 1A Diagrammatically, these are depicted by:
=
A A A A
=
A A A A
l l l l
=
A A A A
=
A A A A
r r r r
Pregroups (P, ≤, •, I, (−)l, (−)r) ∀p ∈ P, ∃pr ∈ P, ∃pl ∈ P pl • p ≤ I ≤ p • pl p • pr ≤ I ≤ pr • p Adjoint are unique and anti-tone p ≤ q = ⇒ ql ≤ pl, qr ≤ pr Unit is self adjoint Il = Ir = I So is multiplication (p • q)l = ql • pl (p • q)r = qr • pr Same adjoint do not cancel out (pr)r = p = (pl)l But opposite adjoints do (pl)r = p = (pr)l
Example of a Proof: adjoints are unique. Suppose p has another left adjoint, call it x. This means x • p ≤ I ≤ p • x Now we have x = x • I ≤ x • p • pl = x • p • pl ≤ I • pl = pl Hence x ≤ pl Similarly pl = pl • I ≤ pl • p • x = pl • p • x ≤ I • x = x Hence pl ≤ x
Example of a Proof
- is self-dual
We want to show the following (also for the right adjoint) (p • q)l = ql • pl Compute (ql • pl) • (p • q) = ql • (pl • p) • q ≤ ql • 1 • q = ql • q ≤ I Also (p • q) • (ql • pl) = p • (q • ql) • pl ≥ p • 1 • pl = p • pl ≥ I Hence we have (ql • pl) • (p • q) ≤ I ≤ (p • q) • (ql • pl) So ql • pl is the left adjoint to p • q, but so is (p • q)l. Since adjoints are unique, we get ql • pl = (p • q)l
Examples of a Pregroup (0) A pregroup in which pl = pr = p−1 is a (po)-group. (1) The set of all unbounded monotone functions on integers. f : Z → Z m ≤ n = ⇒ f(m) ≤ f(n) and m → ∞ = ⇒ f(m) → ∞ The order is defined pointwisely f ≤ g iff f(n) ≤ g(n) ∀n ∈ Z The • is function composition and its unit is the identity (f • g)(n) = f(g(n)) and I(n) = n Adjoints are defined canonically, ∨ is max, ∧ is min fr(x) = ∨{y ∈ Z | f(y) ≤ x} fl(x) = ∧{y ∈ Z | x ≤ f(y)}
Example 1) Take f(x) = 2x. Define adjoints as follows: fr(x) = ∨{y ∈ Z | 2y ≤ x} fl(x) = ∧{y ∈ Z | x ≤ 2y} fr(x) = ⌊x/2⌋ and fl(x) = ⌊(x + 1)/2⌋ where ⌊x⌋ is the biggest integer less than or equal to x. 2) Restrict to N and a nice example is π(x) = the x’th prime πr(x) =? π(5) = 11 πr(5) = 3
Application to Linguistics Let Σ be the set of words of a natural language and B their types.
- Def. A Pregroup dictionary for Σ based on B is a binary relation
D ⊆ Σ × T(B) where T(B) is the free pregroup generated over the partial order B.
- Def. A Pregroup grammar is a pair
G = D, s
- f a pregroup dictionary and a distinguished element s ∈ B.
- Def. A string of words w1 . . . wn of Σ is a grammatical sentence
if and only if t1 • · · · • tn ≤ s for (wi, ti) an element in D.
Example A simple dictionary has basic types B = {π, o, w, s, q, q, j, σ} π, o, w stand for subject, direct object, indirect object, s, j stand for statement, infinitive of a verb, q, q stand for yes-no and wh questions, σ is an index type. Partial order π ≤ n,
- ≤ n .
Dictionary John: π likes: πrsol does: πrsjlσ Mary : o like: σrjπl not: σrjjlσ
Examples Compose the types of the constituents John likes Mary → statement π (πrs ol)
- ≤
s Compute ππrsolo ≤ 1solo ≤ 1s1 = s John does not like Mary → statement π (πrsjlσ) (σrjjlσ) (σrjol)
- ≤
s Compute: ππrsjlσσrjjlσσrjolo ≤ 1sjl1jjl1j1 = sjljjlj ≤ s11 = s Can you think of a simpler way to compute the above?
Depicting the Reduction Each reduction corresponds to a diagram. John likes Mary π πr s ol
- John does not like Mary
π πrsjlσ σrjjlσ σrjol
Di-transitive Sentences John gave Mary an apple. = ⇒ statement π (πr s wl ol)
- w
→ s Adverbs John saw Mary yesterday. = ⇒ statement π (πr s ol)
- (srs)
→ s Yes-no questions Does John like Mary? = ⇒ question (q ilπl) π (i ol)
- →
q Wh-questions Who does John like? = ⇒ question (q ollql) qilπl π (i ol) → q
A Pregroup forms a Compact Closed Category. 1- Elements of pregroup are objects p, q ∈ P . 2- Partial order is the morphism p → q iff p ≤ q . 3- Tensor is monoid multiplication p ⊗ q = p • q = pq, unit is 1 . 4- Adjoints are adjoints pl, pr. 5- Epsilon maps cancel out types: ǫr = ppr ≤ 1 ǫl = plp ≤ 1 6- Eta maps are generate types: ηr = 1 ≤ prp ηl = 1 ≤ ppl
The reduction of types become morphisms of the category: John likes Mary π πrs ol
- ǫr
π
⊗ 1s ⊗ ǫl
- : ππrsolo → s
The reduction of types become morphisms of the category: John π does πrsjlσ not σrjjlσ like σrjol Mary
- ǫr
π ⊗ 1sjl ⊗ ǫr σ ⊗ 1jjl ⊗ ǫr σ ⊗ 1j ⊗ ǫl
- ;
- 1s ⊗ ǫl
j ⊗ ǫl j
- : ππrsjlσσrjjlσσrjolo → s
Great! We have our glue now. But we did not say anything about meanings of words. What do ‘John’, ‘Mary’ and ‘like’ mean? Sets of humans and actions? Problem: ?
Distributional Model of Meaning Firth: ”You shall know a word by the company it keeps”. Intuition: The meanings of cat and dog are similar (in some way) because they are both pets, often furry and are frequently stroked. These facts are reflected in text: cat and dog both appear close to the words pet, furry, stroke. In the same way, there is a similarity between the words lion and tiger, also between ship and boat, etc.
Vector Spaces Build a highly dimensional vector spaces, whose basis are in prin- ciple all the words of a language. Fix a window of n-words, form vectors for words you want to learn from the frequency of their co-
- ccurence with each base in the window.
Application: Automatic Thesaurus Construction, Curran (2003): From Distributional to Semantic Similarity, created context vectors from 2 billion words of text, compared context vectors to find pairs
- f synonyms.
Example: Introduction: launch, implementation, advent, addition, adoption, ar- rival, absence, inclusion, creation Methods: technique, procedure, means, approach, strategy, tool, concept, practice, formula, tactic
Linear Algebra Vectors and Tensors Any vector can be written as a weighted sum of basis vectors − → a = C1− → v1 + C2− → v2 + · · · + Cn− → vn =
- i
Ci− → vi For Ci ∈ R a weight, (− → v1, . . . , − → vn) a basis of A, and − → a ∈ A. The tensor A ⊗ B has as a basis the cartesian product of a basis of A with a basis of B, so a typical vector − → c ∈ A ⊗ B is written
- ij
Cij (− → vi ⊗ − → v′
j)
=
- ij
Cij (− → vi, − → v′
j)
For (− → v1, . . . , − → vn) a basis of A and ( − → v′
1, . . . ,
− → v′
n) a basis of B.
Recalling Linear Algebra Entangled Vectors and Inner Product In general − → c ∈ A⊗B cannot be written as the tensor of two vectors, except for the case when − → c is not entangled. In which case, it can and we obtain − → c = − → a ⊗ − → b =
- ij
Ci × C′
j (−
→ vi ⊗ − → v′
j)
for − → a =
i Ci−
→ vi and − → b =
j C′ j
− → v′
j .
The separable ⊗ is referred to as the Kronecker Product. The inner product of two vectors is denoted and defined by − → a | − → b =
- i
Ci × C′
i
∈ R
Examples Vector Space A: pet stroke furry dog cat Basis of A is {− − → furry, − → pet, − − − → stroke}. − → dog = 2 × − − → furry + 5 × − → pet + 7 × − − − → stroke − → cat = 3 × − − → furry + 3 × − → pet + 8 × − − − → stroke − → dog ⊗ − → cat = 2 × 3(− − → furry ⊗ − − → furry) + 2 × 3(− − → furry ⊗ − → pet) + 2 × 8(− − − → stroke ⊗ − − → furry) + · · · But A ⊗ A has much more elements in it. − → dog | − → cat = 2 × 3 + 5 × 3 + 7 × 8 Cos(− → dog, − → cat) = − → dog | − → cat | − → dog | × | − → cat |
FVector Spaces form a Compact Closed Category. 1- Vector spaces are objects V, W 2- Linear maps are morphisms f : V → W 3- Tensor is tensor V ⊗ W, unit is R. 4- Adjoints are identity V l = V ∗ = V r 5- Epsilon maps are inner products ǫl = ǫr : V ∗ ⊗ V → R
- ij
Cij (− → vi ⊗ − → vj) →
- ij
Cij− → vi|− → vj =
- i
Cii 6- Eta maps create entangled states ηl = ηr :
R → V ⊗ V ∗
1 →
- i
(− → vi ⊗ − → vi)
Great! We have our meanings of words now, but no glue! How can we form a vector for the meaning of a sentence? Summation: − − − − − − − − − − → John likes Mary = − − − → likes + − − − → John + − − − → Mary Multiplication: − − − − − − − − − − → John likes Mary = − − − → likes × − − − → John × − − − → Mary Problem: ? Can you suggest a solution?
Combining Vector Spaces and Pregroups Categorical Semantics FV ect × Pregroup Objects: (− → w ∈ W, p) Morphisms: (f : V → W, α) Syntax/Semantics dictionary
- To each word wi, assign an object
(− → wi ∈ Wi, pi)
- To each sentence w1 · · · wn, with Pregroup reduction
p1 · · · pn
α ✲ s
assign an object too (− − − − − − → w1 · · · wn ∈ S, s)
Compositional Meaning for Sentences Def: Com. Dist Meaning. Given a sentence w1 · · · wn define its meaning as − − − − − − → w1 · · · wn := f(− → w1 ⊗ · · · ⊗ − → wn) The linear map f : W1 ⊗ · · · ⊗ Wn
✲ S
is the Pregroup reduction of the sentence p1 · · · pn
α ✲ s
expressed in FV ect as follows [α](pi/Wi)
We have both our glue and something to glue with! glue = pregroup grammatical structure. thing to glue = meanings vectors of words
- −
− − → John, − − − → likes, − − → beer
- The diagram can be written as a map f standing for the grammatical
structure, yet expressible in the language of vector spaces − − − − − − − − − − − → John likes beer := f(− − − → John, − − − → likes, − − → beer)
Positive Transitive Sentence ”John likes beer” Syntax/Semantics Dictionary: (− − − → John ∈ V, π) (− − − → likes ∈ V ⊗ S ⊗ W, πrsol) (− − → beer ∈ W, o) Grammar: α = ǫr
π ⊗ 1s ⊗ ǫl
- =
⇒ f = ǫr
V ⊗ 1S ⊗ ǫl W
Meaning: − − − − − − − − − − − → John likes beer =
- ǫr
V ⊗ 1S ⊗ ǫl W
−
− − → John ⊗ − − − → likes ⊗ − − → beer
- v
w Ψ
Towards a concrete setting − − − → likes is a non-separable vector in V ⊗ S ⊗ W − − − → likes =
- ikj
Cikj (− → v i ⊗ − → s k ⊗ − → w j) for − → v i, − → w j, − → s k basis of V, W, S. Meaning of the sentence becomes equal to
- ǫr
V ⊗ 1S ⊗ ǫl W
-
−
− − → John ⊗
ikj
Cikj(− → v i ⊗ − → s k ⊗ − → w j)
⊗ −
− → beer
=
- k
ij
Cikj− − − → John|− → v i− → w j|− − → beer
−
→ s k Use the space
k Cikj−
→ s k to define concrete meaning.
Example: Boolean Meaning V has basis {− → mi}i encoding all men. Assume John is m3. W has basis {− → dj}j encoding all drinks. Assume beer is d4. S is the two dimensional space with basis {− − → true, − − → false}. Define:
- k
Cikj− → sk =: − → s ij =
− − → true mi likes dj − − → false
- .w.
Compute
- ij
−
→ m3 | − → mi
- ⊗ −
→ s ij⊗
−
→ d j | − → d4
- =
- ij
δ3i − → s ij δj4 = − → s 34 = true if ”John likes beer” and false
- therwise.
Example: Negative Transitive Sentence ”John does not like beer”
v w Ψ
does not
For does and not we have: − − → does = − → not =
not
Since f =
f
V W
= ⇒ Ψf = f
V W V
Substitute these in the diagram above and we obtain
v w Ψ
does not
=
v w Ψ
not
=
v w Ψ v w Ψ
=
not not
=
Now this is equivalent to
- ǫV ⊗
- 1
1 1 1
- ⊗ ǫW
- (−
→ v ⊗ − → Ψ ⊗ − → w ) . In the Boolean example above, that is
- 1
1 1 1
- −
→ s 34 =
− − → false − → s 34 = − − → true − − → true − → s 34 = − − → false
Example: Negative Transitive Sentence ”John does not like beer” Syntax/Semantics Dictionary: For John and beer as before. (− − → does ∈ V ⊗ S ⊗ J ⊗ V, πrsjlσ) (− → not ∈ V ⊗ J ⊗ J ⊗ V, σrjjlσ) (− − → like ∈ V ⊗ J ⊗ W, σrjol) Logical words: − → not =
- k
− → mk ⊗ (|10 + |01) ⊗ − → mk − − → does =
- l
− → ml ⊗ (|11 + |00) ⊗ − → ml
Grammar: α =
- 1s ⊗ ǫl
j ⊗ ǫl j
- ǫr
π ⊗ 1sjl ⊗ ǫr σ ⊗ 1jjl ⊗ ǫr σ ⊗ 1j ⊗ ǫl
- Meaning:
f =
- 1S ⊗ ǫl
J ⊗ ǫl J
- ǫr
V ⊗ 1SJ ⊗ ǫr V ⊗ 1JJ ⊗ ǫr V ⊗ 1J ⊗ ǫl W
- −
− − − − − − − − − − − − − − − − → John does not like beer = (f)
−
− − → John ⊗ − − → does ⊗ − → not ⊗ − − → like ⊗ − − → beer
(f ◦ g)
−
→ m3 ⊗
l −
→ ml ⊗ does ⊗ − → ml
- ⊗
k −
→ mk ⊗ not ⊗ − → mk
- ⊗
ij −
→ mi ⊗ − → s ij ⊗ − → d j
- ⊗ −
→ d 4
- =
l−
→ m3 | − → ml ⊗ does ⊗ − → ml
- ⊗
k −
→ mk ⊗ not ⊗ − → mk
- ⊗
ij −
→ mi ⊗ − → s ij ⊗ − → d j | − → d 4
- l δ3l ⊗ does ⊗ −
→ ml
- ⊗
k −
→ mk ⊗ not ⊗ − → mk
- ⊗
ij −
→ mi ⊗ − → s ij ⊗ δj4
- does ⊗ −
→ m3 ⊗
k −
→ mk ⊗ not ⊗ − → mk
- ⊗
i −
→ mi ⊗ − → s i4
- does ⊗
k−
→ m3 | − → mk ⊗ not ⊗ − → mk
- ⊗
i −
→ mi ⊗ − → s i4
- does ⊗
k δ3k ⊗ not ⊗ −
→ mk
- ⊗
i −
→ mi ⊗ − → s i4
- does ⊗ not ⊗ −
→ m3 ⊗
i −
→ mi ⊗ − → s i4
- does ⊗ not ⊗
i−
→ m3 | − → mi ⊗ − → s i4
- does ⊗ not ⊗
i δ3i ⊗ −
→ s i4
−
→ m3 ⊗
l −
→ ml ⊗ does ⊗ − → ml
- ⊗
k −
→ mk ⊗ not ⊗ − → mk
- ⊗
ij −
→ mi ⊗ − → s ij ⊗ − → d j
- ⊗ −
→ d 4
- l−
→ m3 | − → ml ⊗ does ⊗ − → ml
- ⊗
k −
→ mk ⊗ not ⊗ − → mk
- ⊗
ij −
→ mi ⊗ − → s ij ⊗ − → d j | − → d 4
- l δ3l ⊗ does ⊗ −
→ ml
- ⊗
k −
→ mk ⊗ not ⊗ − → mk
- ⊗
ij −
→ mi ⊗ − → s ij ⊗ δj4
- does ⊗ −
→ m3 ⊗
k −
→ mk ⊗ not ⊗ − → mk
- ⊗
i −
→ mi ⊗ − → s i4
- does ⊗
k−
→ m3 | − → mk ⊗ not ⊗ − → mk
- ⊗
i −
→ mi ⊗ − → s i4
- does ⊗
k δ3k ⊗ not ⊗ −
→ mk
- ⊗
i −
→ mi ⊗ − → s i4
- does ⊗ not ⊗ −
→ m3 ⊗
i −
→ mi ⊗ − → s i4
- does ⊗ not ⊗
i−
→ m3 | − → mi ⊗ − → s i4
- does ⊗ not ⊗
i δ3i ⊗ −
→ s i4
- does ⊗ not ⊗ −
→ s 34
does ⊗ not ⊗ − → s 34 = (|00 + |11) ⊗ (|10 + |01) ⊗ − → s 34 = |0 01
0−
→ s 34
+|0 00 1−
→ s 34
+|1 11 0−
→ s 34
+|1 10 1−
→ s 34 = |0 1− → s 34
+ |1 0−
→ s 34 =
|0 11
+ |1 01
- −
→ s 34 = − − → true |0 10
+ |1 00
- −
→ s 34 = − − → false =
− − → false − → s 34 = − − → true − − → true − → s 34 = − − → false
Needs more work We have a good case for meanings of
- verbs
- nouns
- adjectives
- adverbs
We do not yet fully understand meanings of
- quantifiers
- relative clauses
- conjunctives.
We have a parser for pregroups. Remains to add to it the vector space part.