Phonological Patterns and Phonological Learners Jeffrey Heinz - - PowerPoint PPT Presentation

phonological patterns and phonological learners
SMART_READER_LITE
LIVE PREVIEW

Phonological Patterns and Phonological Learners Jeffrey Heinz - - PowerPoint PPT Presentation

Phonology Formal Language Theory Formal Learning Theories Phonological Learners Phonological Patterns and Phonological Learners Jeffrey Heinz heinz@udel.edu University of Delaware Cornell University Grammar Induction Workshop May 15, 2010


slide-1
SLIDE 1

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Phonological Patterns and Phonological Learners

Jeffrey Heinz heinz@udel.edu

University of Delaware

Cornell University Grammar Induction Workshop May 15, 2010

1 / 62

slide-2
SLIDE 2

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Collaborators

James Rogers (Earlham College) Bill Idsardi (University of Maryland) Cesar Koirala, Regine Lai, Darrell Larsen, Dan Blanchard, Tim O’Neill, Jane Chandlee, Robert Wilder, Evan Bradley (University of Delaware)

2 / 62

slide-3
SLIDE 3

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

How can something learn?

  • 1. How do people generalize beyond their experience?
  • 2. How can anything that computes generalize beyond its

experience?

  • Linguistics / Language Acquisition
  • Computer Science
  • Psychology
  • Artificial Intelligence
  • Philosophy
  • Natural Language Processing
  • . . .

3 / 62

slide-4
SLIDE 4

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Phonological Patterns and Phonological Learners

  • 1. Different phonological patterns are learned by different

learning mechanisms

  • 2. Illustrate with a learner for long-distance patterns

(Strictly k-Piecewise languages and distributions)

4 / 62

slide-5
SLIDE 5

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Phonological Patterns and Phonological Learners

  • 1. Different phonological patterns are learned by different

learning mechanisms

  • 2. Illustrate with a learner for long-distance patterns

(Strictly k-Piecewise languages and distributions) Hypothesis: Phonological learning is modular. There is more than one highly specialized learning mechanism for learning phonology.

4 / 62

slide-6
SLIDE 6

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Phonological Patterns and Phonological Learners

  • 1. Different phonological patterns are learned by different

learning mechanisms

  • 2. Illustrate with a learner for long-distance patterns

(Strictly k-Piecewise languages and distributions) Hypothesis: Phonological learning is modular. There is more than one highly specialized learning mechanism for learning phonology. The debate isn’t likely to be settled soon. All the empirical evidence isn’t in yet nor have all models been fully compared.

4 / 62

slide-7
SLIDE 7

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Phonotactics - Knowledge of word well-formedness

ptak thole hlad plast sram mgla vlas flitch dnom rtut

Halle, M. 1978. In Linguistic Theory and Pyschological Reality. MIT Press.

5 / 62

slide-8
SLIDE 8

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Phonotactics - Knowledge of word well-formedness

possible English words impossible English words thole ptak plast hlad flitch sram mgla vlas dnom rtut

  • 1. Question: How do English speakers know which of these

words belong to different columns?

6 / 62

slide-9
SLIDE 9

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Phonotactics - Knowledge of word well-formedness

possible English words impossible English words thole ptak plast hlad flitch sram mgla vlas dnom rtut

  • 1. Question: How do English speakers know which of these

words belong to different columns?

6 / 62

slide-10
SLIDE 10

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Phonotactics - Knowledge of word well-formedness Chumash Version

StoyonowonowaS stoyonowonowaS stoyonowonowas Stoyonowonowas pisotonosikiwat pisotonoSikiwat

7 / 62

slide-11
SLIDE 11

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Phonotactics - Knowledge of word well-formedness Chumash Version

possible Chumash words impossible Chumash words StoyonowonowaS stoyonowonowaS stoyonowonowas Stoyonowonowas pisotonosikiwat pisotonoSikiwat

  • 1. Question: How do Chumash speakers know which of these

words belong to different columns?

  • 2. By the way, StoyonowonowaS means ‘it stood upright’

(Applegate 1972)

8 / 62

slide-12
SLIDE 12

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Phonotactics - Knowledge of word well-formedness Chumash Version

possible Chumash words impossible Chumash words StoyonowonowaS stoyonowonowaS stoyonowonowas Stoyonowonowas pisotonosikiwat pisotonoSikiwat

  • 1. Question: How do Chumash speakers know which of these

words belong to different columns?

  • 2. By the way, StoyonowonowaS means ‘it stood upright’

(Applegate 1972)

8 / 62

slide-13
SLIDE 13

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Limits on the variation of segmental phonotactics

  • 1. Local sound patterns; e.g. consonant clusters
  • tendencies: sonority sequencing, n-long clusters can be

resoloved into two (n − 1)-long clusters, . . .

(Greenberg 1978, Clements and Keyser 1983, . . . Albright today)

  • 2. Long-distance sound patterns; e.g. consonantal and

vowel harmony

  • Similar segments are involved in long-distance patterns
  • Consonantal harmony patterns do not exhibit blocking:

e.g. *s. . . S unless [z] intervenes.

(Hansson 2001, Rose and Walker 2004)

  • No harmony pattern applies only to the first and last

sounds.

  • 3. Logically possible but unattested segmental phonotactic

patterns:

  • The nth sound after x must be y.
  • Words must contain an even number of sounds of type x.
  • . . .

9 / 62

slide-14
SLIDE 14

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Formal Language Theory

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite

Figure: The Chomsky hierarchy classifies logically possible patterns.

(Chomsky 1956, 1959, Harrison 1978)

10 / 62

slide-15
SLIDE 15

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Formal Language Theory

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite Yoruba copying Kobele 2006 Swiss German Shieber 1985 English nested embedding Chomsky 1957 English consonant clusters Clements and Keyser 1983 Kwakiutl stress Bach 1975 Chumash sibilant harmony Applegate 1972

Figure: Natural language patterns in the Chomsky hierarchy.

11 / 62

slide-16
SLIDE 16

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Formal Language Theory

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite Yoruba copying Kobele 2006 Swiss German Shieber 1985 English nested embedding Chomsky 1957 English consonant clusters Clements and Keyser 1983 Kwakiutl stress Bach 1975 Chumash sibilant harmony Applegate 1972

Figure: Possible theories of natural language.

11 / 62

slide-17
SLIDE 17

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Formal Language Theory

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite Yoruba copying Kobele 2006 Swiss German Shieber 1985 English nested embedding Chomsky 1957 English consonant clusters Clements and Keyser 1983 Kwakiutl stress Bach 1975 Chumash sibilant harmony Applegate 1972

Figure: Possible theories of natural language.

11 / 62

slide-18
SLIDE 18

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Formal Learning Theories

Learner Experience Languages

Figure: Learners are functions φ from experience to languages.

(Gold 1967, Horning 1969, Angluin 1980, Osherson et al. 1984, Angluin 1988, Anthony and Biggs 1991, Kearns and Vazirani 1994, Vapnik 1994, 1998, Jain et al. 1999, Niyogi 2006, de la Higuera 2010)

12 / 62

slide-19
SLIDE 19

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

The Experience

  • 1. It is a sequence.
  • 2. It is finite.

w0 w1 w2 . . . wn

↓ time

13 / 62

slide-20
SLIDE 20

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Types of Experience

  • 1. Positive evidence
  • 2. Positive and negative evidence
  • 3. Noisy evidence
  • 4. Queried Evidence

w0 ∈ L w1 ∈ L w2 ∈ L . . . wn ∈ L

↓ time

14 / 62

slide-21
SLIDE 21

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Types of Experience

  • 1. Positive evidence
  • 2. Positive and negative evidence
  • 3. Noisy evidence
  • 4. Queried Evidence

w0 ∈ L w1 ∈ L w2 ∈ L . . . wn ∈ L

↓ time

14 / 62

slide-22
SLIDE 22

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Types of Experience

  • 1. Positive evidence
  • 2. Positive and negative evidence
  • 3. Noisy evidence
  • 4. Queried Evidence

w0 ∈ L w1 ∈ L w2 ∈ L (but in fact w2 ∈ L) . . . wn ∈ L

↓ time

14 / 62

slide-23
SLIDE 23

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Types of Experience

  • 1. Positive evidence
  • 2. Positive and negative evidence
  • 3. Noisy evidence
  • 4. Queried Evidence

w0 ∈ L w1 ∈ L w2 ∈ L (because learner specifically asked about w2) . . . wn ∈ L

↓ time

14 / 62

slide-24
SLIDE 24

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

The Languages

  • 1. They can be sets of words or distributions over words.
  • 2. They are computable.

Learner Experience Languages

Figure: Learners are functions φ from experience to languages.

15 / 62

slide-25
SLIDE 25

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

The Languages

  • 1. They can be sets of words or distributions over words.
  • 2. They are computable.

I.e. they are describable with grammars.

Learner Experience Languages

Figure: Learners are functions φ from experience to languages.

15 / 62

slide-26
SLIDE 26

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

The Languages

  • 1. They can be sets of words or distributions over words.
  • 2. They are computable.

I.e. they are describable with grammars. I.e they are r.e. languages.

Learner Experience Languages

Figure: Learners are functions φ from experience to languages.

15 / 62

slide-27
SLIDE 27

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

The Languages

  • 1. They can be sets of words or distributions over words.
  • 2. They are computable.

I.e. they are describable with grammars. I.e they are r.e. languages.

Learner Experience

Grammars Figure: Learners are functions φ from experience to grammars.

15 / 62

slide-28
SLIDE 28

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Learning Criteria

  • 1. What does it mean to learn a language?
  • 2. What kind of experience is required for success?
  • 3. What counts as success?

16 / 62

slide-29
SLIDE 29

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What does it mean to learn a language?

  • 1. Convergence.
  • 2. Imagine an infinite sequence. Is there some point n after

which the learner’s hypothesis doesn’t change (much)? datum Learner’s Hypothesis w0 φ(w0) = G0

↓ time

17 / 62

slide-30
SLIDE 30

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What does it mean to learn a language?

  • 1. Convergence.
  • 2. Imagine an infinite sequence. Is there some point n after

which the learner’s hypothesis doesn’t change (much)? datum Learner’s Hypothesis w0 φ(w0) = G0 w1 φ(w0, w1) = G1

↓ time

17 / 62

slide-31
SLIDE 31

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What does it mean to learn a language?

  • 1. Convergence.
  • 2. Imagine an infinite sequence. Is there some point n after

which the learner’s hypothesis doesn’t change (much)? datum Learner’s Hypothesis w0 φ(w0) = G0 w1 φ(w0, w1) = G1 w2 φ(w0, w1, w2) = G2

↓ time

17 / 62

slide-32
SLIDE 32

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What does it mean to learn a language?

  • 1. Convergence.
  • 2. Imagine an infinite sequence. Is there some point n after

which the learner’s hypothesis doesn’t change (much)? datum Learner’s Hypothesis w0 φ(w0) = G0 w1 φ(w0, w1) = G1 w2 φ(w0, w1, w2) = G2 . . .

↓ time

17 / 62

slide-33
SLIDE 33

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What does it mean to learn a language?

  • 1. Convergence.
  • 2. Imagine an infinite sequence. Is there some point n after

which the learner’s hypothesis doesn’t change (much)? datum Learner’s Hypothesis w0 φ(w0) = G0 w1 φ(w0, w1) = G1 w2 φ(w0, w1, w2) = G2 . . . wn φ(w0, w1, w2, . . . , wn) = Gn

↓ time

17 / 62

slide-34
SLIDE 34

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What does it mean to learn a language?

  • 1. Convergence.
  • 2. Imagine an infinite sequence. Is there some point n after

which the learner’s hypothesis doesn’t change (much)? datum Learner’s Hypothesis w0 φ(w0) = G0 w1 φ(w0, w1) = G1 w2 φ(w0, w1, w2) = G2 . . . wn φ(w0, w1, w2, . . . , wn) = Gn . . .

↓ time

17 / 62

slide-35
SLIDE 35

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What does it mean to learn a language?

  • 1. Convergence.
  • 2. Imagine an infinite sequence. Is there some point n after

which the learner’s hypothesis doesn’t change (much)? datum Learner’s Hypothesis w0 φ(w0) = G0 w1 φ(w0, w1) = G1 w2 φ(w0, w1, w2) = G2 . . . wn φ(w0, w1, w2, . . . , wn) = Gn . . . wm φ(w0, w1, w2, . . . , wm) = Gm

↓ time

Does Gm ≃ Gn?

17 / 62

slide-36
SLIDE 36

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What kind of experience is required for success?

Types of Experience

  • 1. Positive-only or positive and negative evidence.
  • 2. Noisless or noisy evidence.
  • 3. Queries allowed or not?

Which infinite sequences require convergence?

  • 1. only complete ones? I.e. where every piece of information
  • ccurs at some finite point
  • 2. only computable ones? I.e. the infinite sequence itself is

describable by some grammar

18 / 62

slide-37
SLIDE 37

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What kind of experience is required for success?

Makes learning easier Makes learning harder positive and negative evidence positive evidence only noiseless evidence noisy evidence queries permitted queries not permitted approximate convergence exact convergence complete infinite sequences any infinite sequence computable infinite sequences any infinite sequence

19 / 62

slide-38
SLIDE 38

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What kind of experience is required for success?

Makes learning easier Makes learning harder positive and negative evidence positive evidence only noiseless evidence noisy evidence queries permitted queries not permitted approximate convergence exact convergence complete infinite sequences any infinite sequence computable infinite sequences any infinite sequence

  • 1. Identification in the limit from positive data (Gold 1967)

19 / 62

slide-39
SLIDE 39

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What kind of experience is required for success?

Makes learning easier Makes learning harder positive and negative evidence positive evidence only noiseless evidence noisy evidence queries permitted queries not permitted approximate convergence exact convergence complete infinite sequences any infinite sequence computable infinite sequences any infinite sequence

  • 2. Identification in the limit from positive and negative data

(Gold 1967)

19 / 62

slide-40
SLIDE 40

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What kind of experience is required for success?

Makes learning easier Makes learning harder positive and negative evidence positive evidence only noiseless evidence noisy evidence queries permitted queries not permitted approximate convergence exact convergence complete infinite sequences any infinite sequence computable infinite sequences any infinite sequence

  • 3. Identification in the limit from positive data from r.e. texts

(Gold 1967)

  • 4. Learning context-free and r.e. distributions

(Horning 1969, Angluin 1988)

19 / 62

slide-41
SLIDE 41

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What kind of experience is required for success?

Makes learning easier Makes learning harder positive and negative evidence positive evidence only noiseless evidence noisy evidence queries permitted queries not permitted approximate convergence exact convergence complete infinite sequences any infinite sequence computable infinite sequences any infinite sequence

  • 5. Probably Approximately Correct learning

(Valiant 1984, Anthony and Biggs 1991, Kearns and Vazirani 1994

19 / 62

slide-42
SLIDE 42

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What counts as success?

We are interested in learners of classes of languages and not just a single language. Why?

20 / 62

slide-43
SLIDE 43

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What counts as success?

We are interested in learners of classes of languages and not just a single language. Why? Because every language can be learned by a constant function!

Learner Experience G Grammars

Figure: Learners are functions φ from experience to grammars.

20 / 62

slide-44
SLIDE 44

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Formal Learning Theory

Learning requires a structured hypothesis space, which excludes at least some finite-list hypotheses. Gleitman 1990, p. 12: ‘The trouble is that an observer who notices everything can learn nothing for there is no end of categories known and constructable to describe a situation [emphasis in original].’

21 / 62

slide-45
SLIDE 45

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Formal Learning Theory

Learning requires a structured hypothesis space, which excludes at least some finite-list hypotheses. Gleitman 1990, p. 12: ‘The trouble is that an observer who notices everything can learn nothing for there is no end of categories known and constructable to describe a situation [emphasis in original].’

21 / 62

slide-46
SLIDE 46

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Results of Formal Learning Theories: Do feasible learners exist?

Makes learning easier Makes learning harder positive and negative evidence positive evidence only noiseless evidence noisy evidence queries permitted queries not permitted approximate convergence exact convergence complete infinite sequences any infinite sequence computable infinite sequences any infinite sequence

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite

22 / 62

slide-47
SLIDE 47

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Results of Formal Learning Theories: Do feasible learners exist?

Makes learning easier Makes learning harder positive and negative evidence positive evidence only noiseless evidence noisy evidence queries permitted queries not permitted approximate convergence exact convergence complete infinite sequences any infinite sequence computable infinite sequences any infinite sequence

  • 1. Identification in the limit from positive data (Gold 1967)

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite

22 / 62

slide-48
SLIDE 48

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Results of Formal Learning Theories: Do feasible learners exist?

Makes learning easier Makes learning harder positive and negative evidence positive evidence only noiseless evidence noisy evidence queries permitted queries not permitted approximate convergence exact convergence complete infinite sequences any infinite sequence computable infinite sequences any infinite sequence

  • 2. Identification in the limit from positive and negative data

(Gold 1967)

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite

22 / 62

slide-49
SLIDE 49

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Results of Formal Learning Theories: Do feasible learners exist?

Makes learning easier Makes learning harder positive and negative evidence positive evidence only noiseless evidence noisy evidence queries permitted queries not permitted approximate convergence exact convergence complete infinite sequences any infinite sequence computable infinite sequences any infinite sequence

3. Identification in the limit from positive data from r.e. texts (Gold 1967) 4. Learning context-free and r.e. distributions (Horning 1969, Angluin 1988) (See Clark and Thollard 2004 and other refs in Clark’s earlier talk today.)

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite

22 / 62

slide-50
SLIDE 50

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Results of Formal Learning Theories: Do feasible learners exist?

Makes learning easier Makes learning harder positive and negative evidence positive evidence only noiseless evidence noisy evidence queries permitted queries not permitted approximate convergence exact convergence complete infinite sequences any infinite sequence computable infinite sequences any infinite sequence

  • 5. Probably Approximately Correct learning

(Valiant 1984, Anthony and Biggs 1991, Kearns and Vazirani 1994

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite

22 / 62

slide-51
SLIDE 51

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Formal Learning Theory: Positive Results

Many classes which cross-cut the Chomsky hierarchy and exclude some finite languages are feasibly learnable in the senses discussed (and others).

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite

(Angluin 1980, 1982, Garcia et al. 1990, Muggleton 1990, Denis et al. 2002, Fernau 2003, Yokomori 2003, Clark and Thollard 2004, Oates et al. 2006, Niyogi 2006, Clark and Eryaud 2007, Heinz 2008, to appear, Yoshinaka 2008, Case et al. 2009, de la Higuera 2010) 23 / 62

slide-52
SLIDE 52

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Summary

  • 1. Natural language patterns are not arbitrary: there are

limits to the variation.

  • 2. Structured, restricted hypothesis spaces, which crucially

exclude some finite languages, can be feasibly learned.

  • 3. The positive learning results are proven results, and the

proofs are often constructive.

24 / 62

slide-53
SLIDE 53

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What is the space of possible phonolgical patterns?

Wilson (earlier today): What is the space of possible constraints?

  • 1. I am not claiming the following learners are the full story.
  • 2. I am claiming that they are good approximations to the

full story and that the full story will incorporate their key elements.

  • 3. The role of phonological features, prosody, similarity,

sonority, and phonetic factors more generally is ongoing and fully compatible with the present proposals. (Wilson 2006, Hayes and Wilson 2008, Moreton 2008, Albright 2009, and their talks at this event)

25 / 62

slide-54
SLIDE 54

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Local sound patterns

Distinctions are made on the basis of contiguous subsequences. possible English words impossible English words thole ptak plast hlad flitch sram mgla vlas dnom rtut

26 / 62

slide-55
SLIDE 55

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Local sound patterns and formal language theory

  • 1. The formal languages which make distinctions on the basis
  • f k-long contiguous subsequences are called Strictly

k-Local (McNaughton and Papert 1971, Rogers and Pullum 2007)

  • 2. They are subregular and exclude some finite languages.
  • 3. If every k-long contiguous subsequence is licensed by the

grammar, the word belongs to the language.

27 / 62

slide-56
SLIDE 56

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Local sound patterns and formal language theory

  • 1. The formal languages which make distinctions on the basis
  • f k-long contiguous subsequences are called Strictly

k-Local (McNaughton and Papert 1971, Rogers and Pullum 2007)

  • 2. They are subregular and exclude some finite languages.
  • 3. If every k-long contiguous subsequence is licensed by the

grammar, the word belongs to the language.

stip ptip

27 / 62

slide-57
SLIDE 57

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Local sound patterns and formal language theory

  • 1. The formal languages which make distinctions on the basis
  • f k-long contiguous subsequences are called Strictly

k-Local (McNaughton and Papert 1971, Rogers and Pullum 2007)

  • 2. They are subregular and exclude some finite languages.
  • 3. If every k-long contiguous subsequence is licensed by the

grammar, the word belongs to the language.

stip ptip

27 / 62

slide-58
SLIDE 58

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Local sound patterns and formal language theory

  • 1. The formal languages which make distinctions on the basis
  • f k-long contiguous subsequences are called Strictly

k-Local (McNaughton and Papert 1971, Rogers and Pullum 2007)

  • 2. They are subregular and exclude some finite languages.
  • 3. If every k-long contiguous subsequence is licensed by the

grammar, the word belongs to the language.

stip ptip

27 / 62

slide-59
SLIDE 59

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Local sound patterns and formal language theory

  • 1. The formal languages which make distinctions on the basis
  • f k-long contiguous subsequences are called Strictly

k-Local (McNaughton and Papert 1971, Rogers and Pullum 2007)

  • 2. They are subregular and exclude some finite languages.
  • 3. If every k-long contiguous subsequence is licensed by the

grammar, the word belongs to the language.

stip ptip

27 / 62

slide-60
SLIDE 60

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Local sound patterns and formal language theory

  • 1. The formal languages which make distinctions on the basis
  • f k-long contiguous subsequences are called Strictly

k-Local (McNaughton and Papert 1971, Rogers and Pullum 2007)

  • 2. They are subregular and exclude some finite languages.
  • 3. If every k-long contiguous subsequence is licensed by the

grammar, the word belongs to the language.

stip ptip

27 / 62

slide-61
SLIDE 61

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Local sound patterns and formal language theory

  • 1. The formal languages which make distinctions on the basis
  • f k-long contiguous subsequences are called Strictly

k-Local (McNaughton and Papert 1971, Rogers and Pullum 2007)

  • 2. They are subregular and exclude some finite languages.
  • 3. If every k-long contiguous subsequence is licensed by the

grammar, the word belongs to the language.

stip ptip

27 / 62

slide-62
SLIDE 62

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Local sound patterns and formal language theory

  • 1. The formal languages which make distinctions on the basis
  • f k-long contiguous subsequences are called Strictly

k-Local (McNaughton and Papert 1971, Rogers and Pullum 2007)

  • 2. They are subregular and exclude some finite languages.
  • 3. If every k-long contiguous subsequence is licensed by the

grammar, the word belongs to the language.

stip ptip ×

27 / 62

slide-63
SLIDE 63

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns

Distinctions are made on the basis of potentially discontiguous subsequences. possible Chumash words impossible Chumash words shtoyonowonowash stoyonowonowaS stoyonowonowas Stoyonowonowas pisotonosikiwat pisotonoSikiwat

28 / 62

slide-64
SLIDE 64

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns and formal language theory

  • 1. The formal languages and distributions which make distinctions on

the basis of k-long (potentially discontiguous) subsequences are called Strictly k-Piecewise (Heinz 2007, Rogers et al. 2009, Heinz to appear, Heinz and Rogers to appear).

  • 2. They are subregular and exclude some finite languages.
  • 3. Consonantal harmony patterns with blocking are not Strictly

Piecewise for any k.

  • 4. Harmony patterns which apply only to the first and last sounds are

not Strictly Piecewise for any k.

  • 5. Strictly k-Piecewise models underlie models of reading comprehension

(Schoonbaert and Grainger2004, Grainger and Whitney2004)

  • 6. If every k-long subsequence is licensed by the grammar, the word

belongs to the language.

29 / 62

slide-65
SLIDE 65

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns and formal language theory

  • 1. The formal languages and distributions which make distinctions on

the basis of k-long (potentially discontiguous) subsequences are called Strictly k-Piecewise (Heinz 2007, Rogers et al. 2009, Heinz to appear, Heinz and Rogers to appear).

  • 2. They are subregular and exclude some finite languages.
  • 3. Consonantal harmony patterns with blocking are not Strictly

Piecewise for any k.

  • 4. Harmony patterns which apply only to the first and last sounds are

not Strictly Piecewise for any k.

  • 5. Strictly k-Piecewise models underlie models of reading comprehension

(Schoonbaert and Grainger2004, Grainger and Whitney2004)

  • 6. If every k-long subsequence is licensed by the grammar, the word

belongs to the language.

sotos sotoS

29 / 62

slide-66
SLIDE 66

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns and formal language theory

  • 1. The formal languages and distributions which make distinctions on

the basis of k-long (potentially discontiguous) subsequences are called Strictly k-Piecewise (Heinz 2007, Rogers et al. 2009, Heinz to appear, Heinz and Rogers to appear).

  • 2. They are subregular and exclude some finite languages.
  • 3. Consonantal harmony patterns with blocking are not Strictly

Piecewise for any k.

  • 4. Harmony patterns which apply only to the first and last sounds are

not Strictly Piecewise for any k.

  • 5. Strictly k-Piecewise models underlie models of reading comprehension

(Schoonbaert and Grainger2004, Grainger and Whitney2004)

  • 6. If every k-long subsequence is licensed by the grammar, the word

belongs to the language.

sotos sotoS

29 / 62

slide-67
SLIDE 67

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns and formal language theory

  • 1. The formal languages and distributions which make distinctions on

the basis of k-long (potentially discontiguous) subsequences are called Strictly k-Piecewise (Heinz 2007, Rogers et al. 2009, Heinz to appear, Heinz and Rogers to appear).

  • 2. They are subregular and exclude some finite languages.
  • 3. Consonantal harmony patterns with blocking are not Strictly

Piecewise for any k.

  • 4. Harmony patterns which apply only to the first and last sounds are

not Strictly Piecewise for any k.

  • 5. Strictly k-Piecewise models underlie models of reading comprehension

(Schoonbaert and Grainger2004, Grainger and Whitney2004)

  • 6. If every k-long subsequence is licensed by the grammar, the word

belongs to the language.

sotos sotoS

29 / 62

slide-68
SLIDE 68

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns and formal language theory

  • 1. The formal languages and distributions which make distinctions on

the basis of k-long (potentially discontiguous) subsequences are called Strictly k-Piecewise (Heinz 2007, Rogers et al. 2009, Heinz to appear, Heinz and Rogers to appear).

  • 2. They are subregular and exclude some finite languages.
  • 3. Consonantal harmony patterns with blocking are not Strictly

Piecewise for any k.

  • 4. Harmony patterns which apply only to the first and last sounds are

not Strictly Piecewise for any k.

  • 5. Strictly k-Piecewise models underlie models of reading comprehension

(Schoonbaert and Grainger2004, Grainger and Whitney2004)

  • 6. If every k-long subsequence is licensed by the grammar, the word

belongs to the language.

sotos sotoS

29 / 62

slide-69
SLIDE 69

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns and formal language theory

  • 1. The formal languages and distributions which make distinctions on

the basis of k-long (potentially discontiguous) subsequences are called Strictly k-Piecewise (Heinz 2007, Rogers et al. 2009, Heinz to appear, Heinz and Rogers to appear).

  • 2. They are subregular and exclude some finite languages.
  • 3. Consonantal harmony patterns with blocking are not Strictly

Piecewise for any k.

  • 4. Harmony patterns which apply only to the first and last sounds are

not Strictly Piecewise for any k.

  • 5. Strictly k-Piecewise models underlie models of reading comprehension

(Schoonbaert and Grainger2004, Grainger and Whitney2004)

  • 6. If every k-long subsequence is licensed by the grammar, the word

belongs to the language.

sotos sotoS

29 / 62

slide-70
SLIDE 70

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns and formal language theory

  • 1. The formal languages and distributions which make distinctions on

the basis of k-long (potentially discontiguous) subsequences are called Strictly k-Piecewise (Heinz 2007, Rogers et al. 2009, Heinz to appear, Heinz and Rogers to appear).

  • 2. They are subregular and exclude some finite languages.
  • 3. Consonantal harmony patterns with blocking are not Strictly

Piecewise for any k.

  • 4. Harmony patterns which apply only to the first and last sounds are

not Strictly Piecewise for any k.

  • 5. Strictly k-Piecewise models underlie models of reading comprehension

(Schoonbaert and Grainger2004, Grainger and Whitney2004)

  • 6. If every k-long subsequence is licensed by the grammar, the word

belongs to the language.

sotos sotoS

29 / 62

slide-71
SLIDE 71

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns and formal language theory

  • 1. The formal languages and distributions which make distinctions on

the basis of k-long (potentially discontiguous) subsequences are called Strictly k-Piecewise (Heinz 2007, Rogers et al. 2009, Heinz to appear, Heinz and Rogers to appear).

  • 2. They are subregular and exclude some finite languages.
  • 3. Consonantal harmony patterns with blocking are not Strictly

Piecewise for any k.

  • 4. Harmony patterns which apply only to the first and last sounds are

not Strictly Piecewise for any k.

  • 5. Strictly k-Piecewise models underlie models of reading comprehension

(Schoonbaert and Grainger2004, Grainger and Whitney2004)

  • 6. If every k-long subsequence is licensed by the grammar, the word

belongs to the language.

sotos sotoS

29 / 62

slide-72
SLIDE 72

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns and formal language theory

  • 1. The formal languages and distributions which make distinctions on

the basis of k-long (potentially discontiguous) subsequences are called Strictly k-Piecewise (Heinz 2007, Rogers et al. 2009, Heinz to appear, Heinz and Rogers to appear).

  • 2. They are subregular and exclude some finite languages.
  • 3. Consonantal harmony patterns with blocking are not Strictly

Piecewise for any k.

  • 4. Harmony patterns which apply only to the first and last sounds are

not Strictly Piecewise for any k.

  • 5. Strictly k-Piecewise models underlie models of reading comprehension

(Schoonbaert and Grainger2004, Grainger and Whitney2004)

  • 6. If every k-long subsequence is licensed by the grammar, the word

belongs to the language.

sotos sotoS

29 / 62

slide-73
SLIDE 73

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns and formal language theory

  • 1. The formal languages and distributions which make distinctions on

the basis of k-long (potentially discontiguous) subsequences are called Strictly k-Piecewise (Heinz 2007, Rogers et al. 2009, Heinz to appear, Heinz and Rogers to appear).

  • 2. They are subregular and exclude some finite languages.
  • 3. Consonantal harmony patterns with blocking are not Strictly

Piecewise for any k.

  • 4. Harmony patterns which apply only to the first and last sounds are

not Strictly Piecewise for any k.

  • 5. Strictly k-Piecewise models underlie models of reading comprehension

(Schoonbaert and Grainger2004, Grainger and Whitney2004)

  • 6. If every k-long subsequence is licensed by the grammar, the word

belongs to the language.

sotos sotoS

29 / 62

slide-74
SLIDE 74

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Long-distance sound patterns and formal language theory

  • 1. The formal languages and distributions which make distinctions on

the basis of k-long (potentially discontiguous) subsequences are called Strictly k-Piecewise (Heinz 2007, Rogers et al. 2009, Heinz to appear, Heinz and Rogers to appear).

  • 2. They are subregular and exclude some finite languages.
  • 3. Consonantal harmony patterns with blocking are not Strictly

Piecewise for any k.

  • 4. Harmony patterns which apply only to the first and last sounds are

not Strictly Piecewise for any k.

  • 5. Strictly k-Piecewise models underlie models of reading comprehension

(Schoonbaert and Grainger2004, Grainger and Whitney2004)

  • 6. If every k-long subsequence is licensed by the grammar, the word

belongs to the language.

sotos sotoS ×

29 / 62

slide-75
SLIDE 75

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite SL 30 / 62

slide-76
SLIDE 76

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite SP 30 / 62

slide-77
SLIDE 77

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite SP SL 30 / 62

slide-78
SLIDE 78

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Background - Subregular Hierarchies

Regular NonCounting = Star-Free Locally Testable Piecewise Testable Locally Testable in the Strict Sense = Strictly Local Piecewise Testable in the Strict Sense = Strictly Piecewise

(McNaughton and Papert 1971, Simon 1975, Rogers and Pullum 2007, Rogers et. al 2009, Heinz and Rogers to appear)

31 / 62

slide-79
SLIDE 79

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Background - Subregular Hierarchies

contiguous subsequences subsequences Locally Testable Locally Testable in the Strict Sense = Strictly Local Piecewise Testable Piecewise Testable in the Strict Sense = Strictly Piecewise Regular NonCounting = Star-Free

(McNaughton and Papert 1971, Simon 1975, Rogers and Pullum 2007, Rogers et. al 2009, Heinz and Rogers to appear)

31 / 62

slide-80
SLIDE 80

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Strictly Local and Strictly Piecewise Models

Strictly 2-Local Strictly 2-Piecewise Contiguous subsequences Subsequences (discontiguous OK) Successor (+1) Less than (<) .*ab.* .*a.*b.* Immediate Predecessor Predecessor

b c 1 a b c a b c 1 a a b c

0 = have not just seen an [a] 0 = have never seen an [a] 1 = have just seen an [a] 1 = have seen an [a] earlier

32 / 62

slide-81
SLIDE 81

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Similar but different functions

Strictly k-Local The function SLk picks out the k-long contiguous subsequences. Strictly k-Piecewise The function SPk picks

  • ut the k-long (potentially

discontiguous) subsequences. SL2(stip) = {st, ti, ip} SP2(stip) = {st, si, sp, ti, tp, ip}

33 / 62

slide-82
SLIDE 82

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Similar but different

Strictly k-Local Grammars are subsets of k-long

  • sequences. Languages

are all words w such that SLk(w) ⊆ G. Strictly k-Piecewise Grammars are subsets of k-long

  • sequences. Languages

are all words w such that SPk(w) ⊆ G. stip∈ L(G) iff SL2(stip)∈ G stip∈ L(G) iff SP2(stip)∈ G

34 / 62

slide-83
SLIDE 83

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Learning is also similar but different.

  • 1. Stricly k-Local languages are identifiable in the limit from

positive data (Garcia et al. 1990).

  • 2. Keep track of the observed k-long contiguous

subsequences. time word w SL2(w) Grammar G L(G)

  • 1

∅ ∅ aaaa {aa} {aa} aaa∗ 1 aab {aa, ab} {aa, ab} aaa∗ ∪ aaa∗b 2 ba {ba} {aa, ab, ba} Σ∗/Σ∗bbΣ∗ . . . The Strictly 2-Local learner learns *bb

35 / 62

slide-84
SLIDE 84

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Learning long-distance sound patterns

  • 1. Stricly k-Piecewise languages are identifiable in the limit

from positive data (Heinz 2007, to appear).

  • 2. Keep track of the observed k-long subsequences.

i t(i) SP2(t(i)) Grammar G Language of G

  • 1

∅ ∅ aaaa {λ, a, aa} {λ, a, aa} a∗ 1 aab {λ, a, b, aa, ab} {λ, a, aa, b, ab} a∗ ∪ a∗b 2 baa {λ, a, b, aa, ba} {λ, a, b, aa, ab, ba} Σ∗\(Σ∗bΣ∗bΣ∗) 3 aba {λ, a, b, ab, ba} {λ, a, b, aa, ab, ba} Σ∗\(Σ∗bΣ∗bΣ∗) . . .

The learner φSP2 learns *b. . . b

36 / 62

slide-85
SLIDE 85

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

What about distributional learning?

  • 1. Stricly k-Local distributions can be efficiently estimated

(Jurafsky & Martin 2008) (they are n-gram models)

  • 2. Strictly k-Piecewise distributions can be efficiently

estimated (Heinz and Rogers to appear)

37 / 62

slide-86
SLIDE 86

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Regular Languages and Distributions

M1 M2 M3

a b c b a c b a c b a c b a c

a b c

a a b b c c a ab b ac c b a bc c c a b a b abc c a c b b c a a b c

Figure: Σ = {a, b, c}. Each FSA is deterministic and accepts Σ∗. Each DFA represents a family of distributions. A particular distribution is given by assigning probabilities to the transitions.

38 / 62

slide-87
SLIDE 87

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Background - ML Estimatation of Subregular Distributions (structure is known)

M M′

a b c

1/5 a:1/5 b:1/5 c:2/5

M represents a family of distributions with 4

  • parameters. M′ represents a

particular distribution in this family.

Theorem (1)

Let M and M′ be DFAs with the same structure and let DM′ generate a sample S. Then the maximum-likelihood estimate (MLE) of S with respect to M guarantees that DM approaches DM′ as the size of S goes to infinity. (Vidal et. al 2005a, 2005b, de la Higuera 2010)

39 / 62

slide-88
SLIDE 88

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Background - ML Estimatation of Subregular Distributions (structure is known)

M M′

a b c

1/5 a:1/5 b:1/5 c:2/5

M represents a family of distributions with 4

  • parameters. M′ represents a

particular distribution in this family.

Theorem (2)

For a sample S and deterministic finite-state acceptor M, counting the parse of S through M and normalizing at each state

  • ptimizes the maximum-likelihood estimate.

(Vidal et. al 2005a, 2005b, de la Higuera 2010)

39 / 62

slide-89
SLIDE 89

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Background - ML Estimatation of Subregular Distributions (structure is known)

M M′

a b:1 c

1/5 a:1/5 b:1/5 c:2/5

↓ S = {bc} M represents a family of distributions with 4

  • parameters. M′ represents a

particular distribution in this family.

Theorem (2)

For a sample S and deterministic finite-state acceptor M, counting the parse of S through M and normalizing at each state

  • ptimizes the maximum-likelihood estimate.

(Vidal et. al 2005a, 2005b, de la Higuera 2010)

39 / 62

slide-90
SLIDE 90

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Background - ML Estimatation of Subregular Distributions (structure is known)

M M′

a b:1 c:1

1/5 a:1/5 b:1/5 c:2/5

↓ S = {bc} M represents a family of distributions with 4

  • parameters. M′ represents a

particular distribution in this family.

Theorem (2)

For a sample S and deterministic finite-state acceptor M, counting the parse of S through M and normalizing at each state

  • ptimizes the maximum-likelihood estimate.

(Vidal et. al 2005a, 2005b, de la Higuera 2010) 39 / 62

slide-91
SLIDE 91

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Background - ML Estimatation of Subregular Distributions (structure is known)

M M′

1 a b:1 c:1 1/5 a:1/5 b:1/5 c:2/5

↓ S = {bc} M represents a family of distributions with 4

  • parameters. M′ represents a

particular distribution in this family.

Theorem (2)

For a sample S and deterministic finite-state acceptor M, counting the parse of S through M and normalizing at each state

  • ptimizes the maximum-likelihood estimate.

(Vidal et. al 2005a, 2005b, de la Higuera 2010)

39 / 62

slide-92
SLIDE 92

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Background - ML Estimatation of Subregular Distributions (structure is known)

M M′

1/3 a:0 b:1/3 c:1/3 1/5 a:1/5 b:1/5 c:2/5

↓ S = {bc} M represents a family of distributions with 4

  • parameters. M′ represents a

particular distribution in this family.

Theorem (2)

For a sample S and deterministic finite-state acceptor M, counting the parse of S through M and normalizing at each state

  • ptimizes the maximum-likelihood estimate.

(Vidal et. al 2005a, 2005b, de la Higuera 2010)

39 / 62

slide-93
SLIDE 93

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Bigram models (Strictly 2-Local Distributions)

b a c b a c b a c b a c

start a· b· c·

Figure: The structure of a bigram model. The 16 parameters of this model are given by associating probabilities to each transition and to “ending” at each state.

40 / 62

slide-94
SLIDE 94

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Bigram models (Strictly 2-Local Distributions)

b a c b a c b a c b:Pr(b|c) a c

start a· b· c·

Figure: The structure of a bigram model. The 16 parameters of this model are given by associating probabilities to each transition and to “ending” at each state.

40 / 62

slide-95
SLIDE 95

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Regular Languages and Distributions

M1 M2 M3

a b c b a c b a c b a c b a c

a b c

a a b b c c a ab b ac c b a bc c c a b a b abc c a c b b c a a b c

Figure: Σ = {a, b, c}. Each FSA is deterministic and accepts Σ∗. Each DFA represents a family of distributions. A particular distribution is given by assigning probabilities to the transitions. What do the states distinguish?

41 / 62

slide-96
SLIDE 96

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Strictly 2-Piecewise Distributions: The Problem

a a b b c c a ab b ac c b a bc c c a b a b abc c a c b b c a a b c

Equation 1 Piecewise Assumption w = a1a2 . . . an Pr(w) = Pr(a1 | #) × Pr(a2 | a1 <) × . . . × Pr(an | a1, . . . , an−1 <) × Pr(# | a1, . . . an <)

  • What is Pr(a | S <)?

There are 2|Σ| distinct sets S which suggests there are too many(!) independent parameters in the model.

  • Fails to capture intuition regarding Stoyonowonowas:

Pr(s | S,t,o,y,w,n,a <) is not independent of Pr(s | S <).

42 / 62

slide-97
SLIDE 97

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Factors of Strictly 2-Piecewise Distributions

a,b c a,b c c a,b a,c b a,c b a,c b b,c a b,c a a b,c

# # # a< b< c< ¬a< ¬b< ¬c<

43 / 62

slide-98
SLIDE 98

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Factors of Strictly 2-Piecewise Distributions Π

a,b c a,b c c a,b a,c b a,c b a,c b b,c a b,c a a b,c

# # # a< b< c< ¬a< ¬b< ¬c<

=

a a b b c c a ab b ac c b a bc c c a b a b abc c a c b b c a a b c

43 / 62

slide-99
SLIDE 99

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Factors of Strictly 2-Piecewise Distributions Π

a,b c a,b c c a,b a,c b a,c b a,c b b,c a b,c a a b,c

# # # a< b< c< ¬a< ¬b< ¬c<

=

a a b b c c a ab b ac c b a bc c c a b a b abc c a c b b c a a b c

43 / 62

slide-100
SLIDE 100

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Factors of Strictly 2-Piecewise Distributions Π

a,b c a,b c c a,b a,c b a,c b a,c b b,c a b,c a a b,c

# # # a< b< c< ¬a< ¬b< ¬c<

=

a a b b c c a ab b ac c b a bc c c a b a b abc c a c b b c a a b c

43 / 62

slide-101
SLIDE 101

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Factors of Strictly 2-Piecewise Distributions Π

a,b c a,b c c a,b a,c b a,c b a c b b,c a b,c a a b c

# # # a< b< c< ¬a< ¬b< ¬c<

=

a a b b c c a ab b ac c b a bc c c a b a b abc c a c b b c a a b c

43 / 62

slide-102
SLIDE 102

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Strictly 2-Piecewise Distributions: Probabilities

How are the probabilities determined?

44 / 62

slide-103
SLIDE 103

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Strictly 2-Piecewise Distributions: Probabilities Π

a,b c a,b c c a,b a,c b a,c b a c b b,c a b,c a a b c

# # # a< b< c< ¬a< ¬b< ¬c<

=

a a b b c c a ab b ac c b a bc c c a b a b abc c a c b b c a a b c

45 / 62

slide-104
SLIDE 104

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Strictly 2-Piecewise Distributions: Probabilities Π

a,b c a:p7 b:p8 c:p9 c a,b a,c b a,c b b:p5 a:p4 c:p6 b,c a b,c a a:p1 b:p2 c:p3

# # # a< b< c< ¬a< ¬b< ¬c<

=

a a b b c c a ab b ac c b a bc c c a b a b abc c a c b b c a a b c

Pr(c | a, b <) =?

45 / 62

slide-105
SLIDE 105

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Strictly 2-Piecewise Distributions: Probabilities Π

a,b c a:p7 b:p8 c:p9 c a,b a,c b a,c b b:p5 a:p4 c:p6 b,c a b,c a a:p1 b:p2 c:p3

# # # a< b< c< ¬a< ¬b< ¬c<

=

a a b b c c a ab b ac c b a bc c c a b a b abc c a c b b c a a b c

Pr(c | a, b <) def = p3· p6 Z

45 / 62

slide-106
SLIDE 106

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Strictly 2-Piecewise Distributions: Theorem

a,b c a b c c a,b a,c b a,c b b a c b,c a b,c a a b c

# # # a< b< c< ¬a< ¬b< ¬c< Equation 2 (normalized co-emission product) Pr(a | S <) def =

  • s∈S Pr(a | s <)

Z =

a′∈Σ∪{#}

  • s∈S Pr(a′ | s)

Theorem (Heinz and Rogers)

Equations (1) and (2) guarantee a well-formed probability distribution

  • ver all logically possible words. The

distribution has (|Σ| + 1)2 parameters (but distinguishes 2|Σ| states).

46 / 62

slide-107
SLIDE 107

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

ML Estimation of Factorable Distributions

a,b c a b c c a,b a,c b a,c b b a c b,c a b,c a a b c

# # # a< b< c< ¬a< ¬b< ¬c< M = M1 × M2 × . . . Mn Estimate the factors, not their product!

Theorem (Heinz and Rogers)

The maximum likelihood estimate

  • f a data sample drawn from a

Strictly k-Piecewise distribution is

  • btained by finding the MLE

estimates of the sample with respect to the PDFAs which factor the distribution.

47 / 62

slide-108
SLIDE 108

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Chumash Corpus

  • 4800 words drawn from Applegate 2007, generously

provided in electronic form by Applegate (p.c). 35 Consonants

labial coronal a.palatal velar uvular glottal stop p pP ph t tP th k kP kh q qP qh P affricates ⁀ ts ⁀ tsP ⁀ tsh > tS > tSP > tSh fricatives s sP sh S SP Sh x xP h nasal m n nP lateral l lP approx. w y

6 Vowels i 1 u e

  • a

(Applegate 1972, 2007)

48 / 62

slide-109
SLIDE 109

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Chumash: Results of SP2 ML estimation

x P(x | y <) s > ts S > tS s 0.0325 0.0051 0.0013 0.0002 ⁀ ts 0.0212 0.0114 0.0008 0. y S 0.0011 0. 0.067 0.0359 > tS 0.0006 0. 0.0458 0.0314 (Collapsing laryngeal distinctions) It follows that, according to the model, Pr(StoyonowonowaS) ≫ Pr(stoyonowonowaS).

49 / 62

slide-110
SLIDE 110

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Local Summary

  • 1. Like the regions in the Chomsky hierarchy, the Strictly

Local and Strictly Piecewise classes have multiple, independent, converging characterizarions from formal language theory, automata theory, and logic.

  • 2. The possible grammars and languages (distributions?)

form a lattice structure (Kasprzik and K¨

  • tzing, to appear).
  • 3. They are incomparable.
  • 4. Consequently, Strictly Local learners cannot learn Strictly

Piecewise patterns and vice versa.

  • 5. Strictly Piecewise learners cannot learn:
  • blocking patterns, e.g. *s. . . S unless [z] intervenes.
  • harmony patterns which apply only to the first and last

sounds.

50 / 62

slide-111
SLIDE 111

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Competing theories within phonology

  • 1. The main alternative is the tier-based model.

(Goldsmith 1976, Clements 1985, Sagey 1986, Mester 1988,Hayes and Wilson 2008, Goldsmith and Xanthos 2009, Goldsmith and Riggle to appear) tier-based SL (n-gram) models SP models Predicts unattested blocking ef- fects in consonantal harmony Predicts absence of blocking in consonantal harmony Captures blocking effects in vowel harmony Unable to capture blocking ef- fects in vowel harmony Only able to describe patterns with transparent vowels if they are “off” the tier Able to describe patterns with transparent vowels Requires independent theory of tiers Does not require independent theory of tiers Requires independent theory of similarity Requires independent theory of similarity

51 / 62

slide-112
SLIDE 112

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Learning unattested patterns First and Last sound agreement

Words that start with [s] cannot end with [S].

  • ×

sabika sotoS stotaSikop sibaS pabafri sitiS . . . . . .

52 / 62

slide-113
SLIDE 113

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Learning unattested patterns First and last sound agreement

Words that start with [s] cannot end with [S]. The function FL makes distinctions on the basis of the first and last sounds in words. FL(sabika) = {sa} FL(stotaSikop) = {sp} FL(pabafri) = {pi}

53 / 62

slide-114
SLIDE 114

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Learning unattested patterns First and last sound agreement

Words that start with [s] cannot end with [S]. The function FL makes distinctions on the basis of the first and last sounds in words. FL(sabika) = {sa} FL(stotaSikop) = {sp} FL(pabafri) = {pi}

  • 1. The class of such languages is identifiable in the limit from

positive data.

  • 2. The class of languages and grammars form a lattice

structure.

  • 3. The class of such distributions is efficiently estimable from

positive data.

53 / 62

slide-115
SLIDE 115

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Do phonologies make First-Last distinctions?

  • 1. To my knowledge, no such phonotactic has ever been

proposed, nor is any morpho-phonological alternation conditioned by such a phonotactic.

  • 2. Can people learn such patterns if robustly present in the

data?

54 / 62

slide-116
SLIDE 116

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Do phonologies make First-Last distinctions?

  • 1. To my knowledge, no such phonotactic has ever been

proposed, nor is any morpho-phonological alternation conditioned by such a phonotactic.

  • 2. Can people learn such patterns if robustly present in the

data?

54 / 62

slide-117
SLIDE 117

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Domain-specific vs. domain-general?

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite SP SL 55 / 62

slide-118
SLIDE 118

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Conclusion

  • 1. Linguistic patterns are not arbitrary.
  • 2. Only structured classes of patterns can be learned.
  • 3. Distinct, feasible learning models for distinct phonological

patterns exist.

  • 4. These help explain the character of the typology.
  • 5. A single, feasible learning model for these distinct

phonological patterns will likely have to attribute the character of the typology to something else.

  • 6. Artificial language learning experiments can help.

56 / 62

slide-119
SLIDE 119

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Recursively Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite Yoruba copying Kobele 2006 Swiss German Shieber 1985 English nested embedding Chomsky 1957 English consonant clusters Clements and Keyser 1983 Kwakiutl stress Bach 1975 Chumash sibilant harmony Applegate 1972

Thank you

57 / 62

slide-120
SLIDE 120

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Finnish: Corpus

  • 44,040 words from Goldsmith and Riggle (to appear)

19 Consonants

lab. lab.dental cor. pal. velar uvular glottal stop p b t d c k g q fricatives f v s x h nasal m n lateral l rhotic r approx. w j

8 Vowels

  • back

+back i y u e

  • e
  • ae

a Back vowels and front vowels don’t mix (except for [i,e], which are trans- parent).

58 / 62

slide-121
SLIDE 121

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Finnish: Results of SP2 estimation

x P(x | b <) u

  • a

y

  • e

ae i e u 0.056 0.040 0.118 0.006 0.002 0.007 0.084 0.072

  • 0.046

0.033 0.120 0.005 0.002 0.007 0.110 0.067 a 0.045 0.031 0.130 0.005 0.002 0.007 0.095 0.060 y 0.015 0.016 0.038 0.044 0.026 0.066 0.091 0.072 b

  • e

0.023 0.027 0.058 0.030 0.014 0.053 0.095 0.067 ae 0.014 0.014 0.034 0.036 0.015 0.086 0.091 0.073 i 0.030 0.031 0.097 0.011 0.006 0.0240 0.088 0.080 e 0.031 0.026 0.077 0.014 0.005 0.031 0.089 0.071

59 / 62

slide-122
SLIDE 122

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

F G a +

  • b

+ + c

  • +

Table: An example of a feature system with Σ = {a, b, c} and two features F and G.

60 / 62

slide-123
SLIDE 123

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Feature-based generalizations

  • F
  • F

+ F + F

  • F

+ F

  • F

+ F

#

Figure: MF represents a SL2 distribution with respect to feature F.

  • G
  • G

+ G + G

  • G

+ G

  • G

+ G

#

Figure: MG represents a SL2 distribution with respect to feature G. 61 / 62

slide-124
SLIDE 124

Phonology Formal Language Theory Formal Learning Theories Phonological Learners

Feature-based generalization

+ F , + G +F,-G

  • F,+G

#

+F,-G +F,-G +F,+G +F,+G +F,+G

  • F,+G
  • F,+G

+F,+G +F,-G +F,-G +F,-G

  • F,+G

Figure: The structure of the feature product of MF and MG.

62 / 62