Grammatical inference and subregular phonology
Adam Jardine Rutgers University December 10, 2019 · Tel Aviv University
Grammatical inference and subregular phonology Adam Jardine - - PowerPoint PPT Presentation
Grammatical inference and subregular phonology Adam Jardine Rutgers University December 10, 2019 Tel Aviv University Review Outline of course Day 1: Learning, languages, and grammars Day 2: Learning strictly local grammars Day 3:
Adam Jardine Rutgers University December 10, 2019 · Tel Aviv University
Outline of course
2
Review of day 1
computational universals
solutions to the learning problem: Problem Given a positive sample of a language, return a grammar that describes that language exactly 3
Review of day 1
k-factors in a string ⋊ a b b a b ⋉ w ❢❛❝2(w) = a b b a b ⋉ ⋊ a b b a b 4
Today
(Gold, 1967; de la Higuera, 2010)
possible languages) 5
Learning paradigm
Model of language Oracle Learner Model of language MO ML information requests (from Heinz et al., 2016) Problem Given a positive sample of a language, return a grammar that describes that language exactly
1967)
6
Identification in the limit from positive data (ILPD)
Model of language Oracle Learner Model of language G⋆ L⋆ = L(G⋆) G text
7
Identification in the limit from positive data (ILPD)
A presentation of L⋆ is a sequence p of examples drawn from L⋆
L⋆ t p(t) abab 1 ababab 2 ab 3 λ 4 ab . . . . . .
(this is the ‘in the limit’ part) 8
Identification in the limit from positive data (ILPD)
A learner A takes a finite sequence and outputs a grammar
t p(t) abab 1 ababab 2 ab 3 λ 4 ab . . . . . . n ababab . . . . . .
p[i] A Gi 9
Identification in the limit from positive data (ILPD)
Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} 10
Identification in the limit from positive data (ILPD)
Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} 10
Identification in the limit from positive data (ILPD)
Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab 10
Identification in the limit from positive data (ILPD)
Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 10
Identification in the limit from positive data (ILPD)
Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab 10
Identification in the limit from positive data (ILPD)
Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 10
Identification in the limit from positive data (ILPD)
Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab 10
Identification in the limit from positive data (ILPD)
Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 10
Identification in the limit from positive data (ILPD)
Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 3 aaa {ab, bab, aaa} 10
Identification in the limit from positive data (ILPD)
Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 3 aaa {ab, bab, aaa} 4 ab {ab, bab, aaa} 10
Identification in the limit from positive data (ILPD)
Let’s take the learner AFin: AFin(p[n]) = {w | w = p(i) for some i ≤ n} Let’s set L⋆ = {ab, bab, aaa} t p(t) Gt bab {bab} 1 ab {ab, bab} 2 bab {ab, bab} 3 aaa {ab, bab, aaa} 4 ab {ab, bab, aaa} ... 308 bab {ab, bab, aaa} 10
Identification in the limit from positive data (ILPD)
A converges at point n if Gm = Gn for any m > n
t p(t) Gt abab G0 1 ababab G1 2 ab G2 . . . . . . . . . n ababab Gn n + 1 abababab Gn . . . . . . . . . m λ Gn . . . . . . . . .
convergence 11
Identification in the limit from positive data (ILPD)
ILPD-learnability A class C is ILPD-learnable if there is some algorithm AC such that for any stringset L ∈ C, given any positive presentation p
learning? 12
Learning SL languages
language? 13
Learning SL languages
G⋆ = {CC, C⋉} t datum hypothesis V 1 CV CV 2 CV V CV CV 3 V CV CV . . . 14
Learning SL languages
G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV 2 CV V CV CV 3 V CV CV . . . 14
Learning SL languages
G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV {⋊C, ⋊V,CC, CV,C⋉, V C,V V , V ⋉} 2 CV V CV CV 3 V CV CV . . . 14
Learning SL languages
G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV {⋊C, ⋊V,CC, CV,C⋉, V C,V V , V ⋉} 2 CV V CV CV {⋊C, ⋊V,CC, CV,C⋉, V C, V V , V ⋉} 3 V CV CV . . . 14
Learning SL languages
G⋆ = {CC, C⋉} t datum hypothesis V {⋊C, ⋊V,CC, CV, C⋉, V C, V V , V ⋉} 1 CV CV {⋊C, ⋊V,CC, CV,C⋉, V C,V V , V ⋉} 2 CV V CV CV {⋊C, ⋊V,CC, CV,C⋉, V C, V V , V ⋉} 3 V CV CV {⋊C, ⋊V,CC, CV,C⋉, V C, V V , V ⋉} . . . 14
Learning SL languages
ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)} 14
Learning SL languages
ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)}
14
Learning SL languages
ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)}
14
Learning SL languages
ASLk(p[i]) = ❢❛❝k(Σ∗) − ❢❛❝k{p(0), p(1), ..., p(i)}
is directly proportional to the size of the data sample. 14
Learning SL languages
Let’s learn Pintupi. Note that k = 3. What is the initial hypothesis? At what point do we converge?
t datum hypothesis ´ σ 1 ´ σσ 2 ´ σσσ 3 ´ σσ´ σσ 4 ´ σσ´ σσσ 5 ´ σσ´ σσ´ σσ . . .
14
The limits of SL learning
Fin SL L
L′ identical to L for some finite sequence p[i]
from positive examples 15
The limits of SL learning
S-api-tShol-it ‘I have a stroke of good luck’ s-api-tshol-us ‘he has a stroke of good luck’ S-api-tShol-uS-waS ‘he had a stroke of good luck’ ha-Sxintila-waS ‘his former Indian name’ s-is-tisi-jep-us ‘they (two) show him’ k-Su-Sojin ‘I darken it’
16
The limits of SL learning
S-api-tShol-it ‘I have a stroke of good luck’ s-api-tshol-us ‘he has a stroke of good luck’ S-api-tShol-uS-waS ‘he had a stroke of good luck’ ha-Sxintila-waS ‘his former Indian name’ s-is-tisi-jep-us ‘they (two) show him’ k-Su-Sojin ‘I darken it’
*S...s, *s...S 16
The limits of SL learning
LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...}
t datum hypothesis sos {ss, so, sS, ..., Ss, St, SS} 1 sotoss {ss, so, sS, ..., Ss, St, SS} 2 SoStoSS {ss, so, sS, ..., Ss, St, SS} . . .
17
The limits of SL learning
LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...}
t datum hypothesis sos {ss, so, sS, ..., Ss, St, SS} 1 sotoss {ss, so, sS, ..., Ss, St, SS} 2 SoStoSS {ss, so, sS, ..., Ss, St, SS} . . .
17
The limits of SL learning
LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...} G =?{sS,Ss} sosos ∈ LC soSs ∈ LC ✗ soSos ∈ LC 18
The limits of SL learning
LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...} Gk=3 =?{soS, ssS, stS, sSS, ... Sos, Sss, Sts, SSs} sosos ∈ LC soSs ∈ LC soSos ∈ LC ✗ Sotos ∈ LC 19
The limits of SL learning
LC = {so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ...}
20
The limits of SL learning
grammatical inference: – we what patterns it can learn, – what patterns it cannot learn, – on exactly what data 21
Review
– makes restrictive predictions about what patterns can and cannot be learned – suggests phonological learning is modular (Heinz, 2010) – directly connects computational typological generalizations with a theory of learning 22
Review
Problem Given a positive sample of a language, return a grammar that describes that language exactly
limit from positive data
learn processes 23