Grammatical inference and subregular phonology Adam Jardine - - PowerPoint PPT Presentation

grammatical inference and subregular phonology
SMART_READER_LITE
LIVE PREVIEW

Grammatical inference and subregular phonology Adam Jardine - - PowerPoint PPT Presentation

Grammatical inference and subregular phonology Adam Jardine Rutgers University December 10, 2019 Tel Aviv University Review Outline of course Day 1: Learning, languages, and grammars Day 2: Learning strictly local grammars Day 3:


slide-1
SLIDE 1

Grammatical inference and subregular phonology

Adam Jardine Rutgers University December 10, 2019 · Tel Aviv University

slide-2
SLIDE 2

Review

slide-3
SLIDE 3

Outline of course

  • Day 1: Learning, languages, and grammars
  • Day 2: Learning strictly local grammars
  • Day 3: Automata and input strictly local functions
  • Day 4: Learning functions and stochastic patterns, other
  • pen questions

2

slide-4
SLIDE 4

Review of day 1

  • Phonological patterns are governed by restrictive

computational universals

  • Grammatical inference connects these universals to

solutions to the learning problem: Problem Given a positive sample of a language, return a grammar that describes that language exactly 3

slide-5
SLIDE 5

Review of day 1

  • Strictly local languages are patterns computed solely by

k-factors in a string ⋊ a b b a b ⋉ w ❢❛❝2(w) = a b b a b ⋉ ⋊ a b b a b 4

slide-6
SLIDE 6

Today

  • A provably correct method for learning SLk languages
  • The paradigm of identification in the limit from positive data

(Gold, 1967; de la Higuera, 2010)

  • Why learners target classes (not specific languages, or all

possible languages) 5

slide-7
SLIDE 7

Learning paradigm

slide-8
SLIDE 8

Learning paradigm

Model of language Oracle Learner Model of language MO ML information requests (from Heinz et al., 2016) Problem Given a positive sample of a language, return a grammar that describes that language exactly

  • This is (exact) identification in the limit from positive data (ILPD; Gold,

1967)

6

slide-9
SLIDE 9

Identification in the limit from positive data (ILPD)

Model of language Oracle Learner Model of language G⋆ L⋆ = L(G⋆) G text

  • A text of L⋆ is some sample of positive examples of L⋆

7

slide-10
SLIDE 10

Identification in the limit from positive data (ILPD)

A presentation of L⋆ is a sequence p of examples drawn from L⋆

L⋆ t p(t) abab 1 ababab 2 ab 3 λ 4 ab . . . . . .

(this is the ‘in the limit’ part) 8

slide-11
SLIDE 11

Identification in the limit from positive data (ILPD)

A learner A takes a finite sequence and outputs a grammar

t p(t) abab 1 ababab 2 ab 3 λ 4 ab . . . . . . n ababab . . . . . .

p[i] A Gi 9

slide-12
SLIDE 12

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} 10

slide-13
SLIDE 13

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} 10

slide-14
SLIDE 14

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab 10

slide-15
SLIDE 15

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 10

slide-16
SLIDE 16

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab 10

slide-17
SLIDE 17

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 10

slide-18
SLIDE 18

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab 10

slide-19
SLIDE 19

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 10

slide-20
SLIDE 20

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 3 aaa {ab, bab, aaa} 10

slide-21
SLIDE 21

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 3 aaa {ab, bab, aaa} 4 ab {ab, bab, aaa} 10

slide-22
SLIDE 22

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 3 aaa {ab, bab, aaa} 4 ab {ab, bab, aaa} ... 308 bab {ab, bab, aaa} 10

slide-23
SLIDE 23

Identification in the limit from positive data (ILPD)

A converges at point n if Gm = Gn for any m > n

t p(t) Gt abab G0 1 ababab G1 2 ab G2 . . . . . . . . . n ababab Gn n + 1 abababab Gn . . . . . . . . . m λ Gn . . . . . . . . .

convergence 11

slide-24
SLIDE 24

Identification in the limit from positive data (ILPD)

ILPD-learnability A class C is ILPD-learnable if there is some algorithm AC such that for any stringset L ∈ C, given any positive presentation p

  • f L, AC converges to a grammar G such that L(G) = L.
  • How is ILPD learning an idealization?
  • What are the advantages of using ILPD as a criterion for

learning? 12

slide-25
SLIDE 25

Learning strictly local languages

slide-26
SLIDE 26

Learning SL languages

  • Given any k, the class SLk is IDLP-learnable.
  • Using AFin as an example, how might we learn a SLk

language? 13

slide-27
SLIDE 27

Learning SL languages

G⋆ = {CC, C⋉} t datum hypothesis V 1 CV CV 2 CV V CV CV 3 V CV CV . . . 14

slide-28
SLIDE 28

Learning SL languages

G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV 2 CV V CV CV 3 V CV CV . . . 14

slide-29
SLIDE 29

Learning SL languages

G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV {⋊C, ⋊V,CC, CV,C⋉, V C,V V , V ⋉} 2 CV V CV CV 3 V CV CV . . . 14

slide-30
SLIDE 30

Learning SL languages

G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV {⋊C, ⋊V,CC, CV,C⋉, V C,V V , V ⋉} 2 CV V CV CV {⋊C, ⋊V,CC, CV,C⋉, V C, V V , V ⋉} 3 V CV CV . . . 14

slide-31
SLIDE 31

Learning SL languages

G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV {⋊C, ⋊V,CC, CV,C⋉, V C,V V , V ⋉} 2 CV V CV CV {⋊C, ⋊V,CC, CV,C⋉, V C, V V , V ⋉} 3 V CV CV {⋊C, ⋊V,CC, CV,C⋉, V C, V V , V ⋉} . . . 14

slide-32
SLIDE 32

Learning SL languages

ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)} 14

slide-33
SLIDE 33

Learning SL languages

ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)}

  • The characteristic sample is ...

14

slide-34
SLIDE 34

Learning SL languages

ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)}

  • The characteristic sample is ❢❛❝k(L⋆)

14

slide-35
SLIDE 35

Learning SL languages

ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)}

  • The characteristic sample is ❢❛❝k(L⋆)
  • The time complexity is linear—the time it takes to calculate

is directly proportional to the size of the data sample. 14

slide-36
SLIDE 36

Learning SL languages

Let’s learn Pintupi. Note that k = 3. What is the initial hypothesis? At what point do we converge?

t datum hypothesis ´ σ 1 ´ σσ 2 ´ σσσ 3 ´ σσ´ σσ 4 ´ σσ´ σσσ 5 ´ σσ´ σσ´ σσ . . .

14

slide-37
SLIDE 37

The limits of SL learning

slide-38
SLIDE 38

The limits of SL learning

  • We must know k in advance

Fin SL L

L′ identical to L for some finite sequence p[i]

  • Gold (1967): any class C such that Fin C is not learnable

from positive examples 15

slide-39
SLIDE 39

The limits of SL learning

  • Consider this pattern from Inseño Chumash:

S-api-tShol-it ‘I have a stroke of good luck’ s-api-tshol-us ‘he has a stroke of good luck’ S-api-tShol-uS-waS ‘he had a stroke of good luck’ ha-Sxintila-waS ‘his former Indian name’ s-is-tisi-jep-us ‘they (two) show him’ k-Su-Sojin ‘I darken it’

  • What phonotactic constraints are active here?

16

slide-40
SLIDE 40

The limits of SL learning

  • Consider this pattern from Inseño Chumash:

S-api-tShol-it ‘I have a stroke of good luck’ s-api-tshol-us ‘he has a stroke of good luck’ S-api-tShol-uS-waS ‘he had a stroke of good luck’ ha-Sxintila-waS ‘his former Indian name’ s-is-tisi-jep-us ‘they (two) show him’ k-Su-Sojin ‘I darken it’

  • What phonotactic constraints are active here?

*S...s, *s...S 16

slide-41
SLIDE 41

The limits of SL learning

  • Let’s assume L⋆ = LC for Σ = {s,o,t,S} as given below

LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...}

t datum hypothesis sos {ss, so, sS, ..., Ss, St, SS} 1 sotoss {ss, so, sS, ..., Ss, St, SS} 2 SoStoSS {ss, so, sS, ..., Ss, St, SS} . . .

17

slide-42
SLIDE 42

The limits of SL learning

  • Let’s assume L⋆ = LC for Σ = {s,o,t,S} as given below

LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...}

t datum hypothesis sos {ss, so, sS, ..., Ss, St, SS} 1 sotoss {ss, so, sS, ..., Ss, St, SS} 2 SoStoSS {ss, so, sS, ..., Ss, St, SS} . . .

  • Learner will never see sS or Ss, so in the limit G = {sS,Ss}.

17

slide-43
SLIDE 43

The limits of SL learning

LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...} G =?{sS,Ss} sosos ∈ LC soSs ∈ LC ✗ soSos ∈ LC 18

slide-44
SLIDE 44

The limits of SL learning

LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...} Gk=3 =?{soS, ssS, stS, sSS, ... Sos, Sss, Sts, SSs} sosos ∈ LC soSs ∈ LC soSos ∈ LC ✗ Sotos ∈ LC 19

slide-45
SLIDE 45

The limits of SL learning

LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...}

  • There is no k such that ASLk learns a grammar for LC
  • This is because there is no SL grammar for LC!

20

slide-46
SLIDE 46

The limits of SL learning

  • ASLk only learns SLk languages
  • This is the advantage of studying learning with formal

grammatical inference: – we what patterns it can learn, – what patterns it cannot learn, – on exactly what data 21

slide-47
SLIDE 47

Review

  • As a hypothesis of phonotactic learning, ASLk

– makes restrictive predictions about what patterns can and cannot be learned – suggests phonological learning is modular (Heinz, 2010) – directly connects computational typological generalizations with a theory of learning 22

slide-48
SLIDE 48

Review

Problem Given a positive sample of a language, return a grammar that describes that language exactly

  • We have formalized this problem as identification in the

limit from positive data

  • We have solved this problem for any SLk class
  • We’ll find another solution with automata, and extend that to

learn processes 23