Grammatical inference and subregular phonology Adam Jardine - - PowerPoint PPT Presentation

▶

grammatical inference and subregular phonology

Grammatical inference and subregular phonology Adam Jardine - - PowerPoint PPT Presentation

Oct 03, 2023 356 likes •839 views

Grammatical inference and subregular phonology Adam Jardine Rutgers University December 10, 2019 Tel Aviv University Review Outline of course Day 1: Learning, languages, and grammars Day 2: Learning strictly local grammars Day 3:

slide-1

SLIDE 1

Grammatical inference and subregular phonology

Adam Jardine Rutgers University December 10, 2019 · Tel Aviv University

slide-2

SLIDE 2

Review

slide-3

SLIDE 3

Outline of course

Day 1: Learning, languages, and grammars
Day 2: Learning strictly local grammars
Day 3: Automata and input strictly local functions
Day 4: Learning functions and stochastic patterns, other
pen questions

2

slide-4

SLIDE 4

Review of day 1

Phonological patterns are governed by restrictive

computational universals

Grammatical inference connects these universals to

solutions to the learning problem: Problem Given a positive sample of a language, return a grammar that describes that language exactly 3

slide-5

SLIDE 5

Review of day 1

Strictly local languages are patterns computed solely by

k-factors in a string ⋊ a b b a b ⋉ w ❢❛❝2(w) = a b b a b ⋉ ⋊ a b b a b 4

slide-6

SLIDE 6

Today

A provably correct method for learning SLk languages
The paradigm of identification in the limit from positive data

(Gold, 1967; de la Higuera, 2010)

Why learners target classes (not specific languages, or all

possible languages) 5

slide-7

SLIDE 7

Learning paradigm

slide-8

SLIDE 8

Learning paradigm

Model of language Oracle Learner Model of language MO ML information requests (from Heinz et al., 2016) Problem Given a positive sample of a language, return a grammar that describes that language exactly

This is (exact) identification in the limit from positive data (ILPD; Gold,

1967)

6

slide-9

SLIDE 9

Identification in the limit from positive data (ILPD)

Model of language Oracle Learner Model of language G⋆ L⋆ = L(G⋆) G text

A text of L⋆ is some sample of positive examples of L⋆

7

slide-10

SLIDE 10

Identification in the limit from positive data (ILPD)

A presentation of L⋆ is a sequence p of examples drawn from L⋆

L⋆ t p(t) abab 1 ababab 2 ab 3 λ 4 ab . . . . . .

(this is the ‘in the limit’ part) 8

slide-11

SLIDE 11

Identification in the limit from positive data (ILPD)

A learner A takes a finite sequence and outputs a grammar

t p(t) abab 1 ababab 2 ab 3 λ 4 ab . . . . . . n ababab . . . . . .

p[i] A Gi 9

slide-12

SLIDE 12

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} 10

slide-13

SLIDE 13

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} 10

slide-14

SLIDE 14

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab 10

slide-15

SLIDE 15

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 10

slide-16

SLIDE 16

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab 10

slide-17

SLIDE 17

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 10

slide-18

SLIDE 18

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab 10

slide-19

SLIDE 19

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 10

slide-20

SLIDE 20

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 3 aaa {ab, bab, aaa} 10

slide-21

SLIDE 21

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 3 aaa {ab, bab, aaa} 4 ab {ab, bab, aaa} 10

slide-22

SLIDE 22

Identification in the limit from positive data (ILPD)

Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 3 aaa {ab, bab, aaa} 4 ab {ab, bab, aaa} ... 308 bab {ab, bab, aaa} 10

slide-23

SLIDE 23

Identification in the limit from positive data (ILPD)

A converges at point n if Gm = Gn for any m > n

t p(t) Gt abab G0 1 ababab G1 2 ab G2 . . . . . . . . . n ababab Gn n + 1 abababab Gn . . . . . . . . . m λ Gn . . . . . . . . .

convergence 11

slide-24

SLIDE 24

Identification in the limit from positive data (ILPD)

ILPD-learnability A class C is ILPD-learnable if there is some algorithm AC such that for any stringset L ∈ C, given any positive presentation p

f L, AC converges to a grammar G such that L(G) = L.
How is ILPD learning an idealization?
What are the advantages of using ILPD as a criterion for

learning? 12

slide-25

SLIDE 25

Learning strictly local languages

slide-26

SLIDE 26

Learning SL languages

Given any k, the class SLk is IDLP-learnable.
Using AFin as an example, how might we learn a SLk

language? 13

slide-27

SLIDE 27

Learning SL languages

G⋆ = {CC, C⋉} t datum hypothesis V 1 CV CV 2 CV V CV CV 3 V CV CV . . . 14

slide-28

SLIDE 28

Learning SL languages

G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV 2 CV V CV CV 3 V CV CV . . . 14

slide-29

SLIDE 29

Learning SL languages

G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV {⋊C, ⋊V,CC, CV,C⋉, V C,V V , V ⋉} 2 CV V CV CV 3 V CV CV . . . 14

slide-30

SLIDE 30

Learning SL languages

G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV {⋊C, ⋊V,CC, CV,C⋉, V C,V V , V ⋉} 2 CV V CV CV {⋊C, ⋊V,CC, CV,C⋉, V C, V V , V ⋉} 3 V CV CV . . . 14

slide-31

SLIDE 31

Learning SL languages

G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV {⋊C, ⋊V,CC, CV,C⋉, V C,V V , V ⋉} 2 CV V CV CV {⋊C, ⋊V,CC, CV,C⋉, V C, V V , V ⋉} 3 V CV CV {⋊C, ⋊V,CC, CV,C⋉, V C, V V , V ⋉} . . . 14

slide-32

SLIDE 32

Learning SL languages

ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)} 14

slide-33

SLIDE 33

Learning SL languages

ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)}

The characteristic sample is ...

14

slide-34

SLIDE 34

Learning SL languages

ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)}

The characteristic sample is ❢❛❝k(L⋆)

14

slide-35

SLIDE 35

Learning SL languages

ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)}

The characteristic sample is ❢❛❝k(L⋆)
The time complexity is linear—the time it takes to calculate

is directly proportional to the size of the data sample. 14

slide-36

SLIDE 36

Learning SL languages

Let’s learn Pintupi. Note that k = 3. What is the initial hypothesis? At what point do we converge?

t datum hypothesis ´ σ 1 ´ σσ 2 ´ σσσ 3 ´ σσ´ σσ 4 ´ σσ´ σσσ 5 ´ σσ´ σσ´ σσ . . .

14

slide-37

SLIDE 37

The limits of SL learning

slide-38

SLIDE 38

The limits of SL learning

We must know k in advance

Fin SL L

L′ identical to L for some finite sequence p[i]

Gold (1967): any class C such that Fin C is not learnable

from positive examples 15

slide-39

SLIDE 39

The limits of SL learning

Consider this pattern from Inseño Chumash:

S-api-tShol-it ‘I have a stroke of good luck’ s-api-tshol-us ‘he has a stroke of good luck’ S-api-tShol-uS-waS ‘he had a stroke of good luck’ ha-Sxintila-waS ‘his former Indian name’ s-is-tisi-jep-us ‘they (two) show him’ k-Su-Sojin ‘I darken it’

What phonotactic constraints are active here?

16

slide-40

SLIDE 40

The limits of SL learning

Consider this pattern from Inseño Chumash:

S-api-tShol-it ‘I have a stroke of good luck’ s-api-tshol-us ‘he has a stroke of good luck’ S-api-tShol-uS-waS ‘he had a stroke of good luck’ ha-Sxintila-waS ‘his former Indian name’ s-is-tisi-jep-us ‘they (two) show him’ k-Su-Sojin ‘I darken it’

What phonotactic constraints are active here?

*S...s, *s...S 16

slide-41

SLIDE 41

The limits of SL learning

Let’s assume L⋆ = LC for Σ = {s,o,t,S} as given below

LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...}

t datum hypothesis sos {ss, so, sS, ..., Ss, St, SS} 1 sotoss {ss, so, sS, ..., Ss, St, SS} 2 SoStoSS {ss, so, sS, ..., Ss, St, SS} . . .

17

slide-42

SLIDE 42

The limits of SL learning

Let’s assume L⋆ = LC for Σ = {s,o,t,S} as given below

LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...}

t datum hypothesis sos {ss, so, sS, ..., Ss, St, SS} 1 sotoss {ss, so, sS, ..., Ss, St, SS} 2 SoStoSS {ss, so, sS, ..., Ss, St, SS} . . .

Learner will never see sS or Ss, so in the limit G = {sS,Ss}.

17

slide-43

SLIDE 43

The limits of SL learning

LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...} G =?{sS,Ss} sosos ∈ LC soSs ∈ LC ✗ soSos ∈ LC 18

slide-44

SLIDE 44

The limits of SL learning

LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...} Gk=3 =?{soS, ssS, stS, sSS, ... Sos, Sss, Sts, SSs} sosos ∈ LC soSs ∈ LC soSos ∈ LC ✗ Sotos ∈ LC 19

slide-45

SLIDE 45

The limits of SL learning

LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...}

There is no k such that ASLk learns a grammar for LC
This is because there is no SL grammar for LC!

20

slide-46

SLIDE 46

The limits of SL learning

ASLk only learns SLk languages
This is the advantage of studying learning with formal

grammatical inference: – we what patterns it can learn, – what patterns it cannot learn, – on exactly what data 21

slide-47

SLIDE 47

Review

As a hypothesis of phonotactic learning, ASLk

– makes restrictive predictions about what patterns can and cannot be learned – suggests phonological learning is modular (Heinz, 2010) – directly connects computational typological generalizations with a theory of learning 22

slide-48

SLIDE 48

Review

Problem Given a positive sample of a language, return a grammar that describes that language exactly

We have formalized this problem as identification in the

limit from positive data

We have solved this problem for any SLk class
We’ll find another solution with automata, and extend that to

learn processes 23