Investigating the potential of ancestral state reconstruction - - PowerPoint PPT Presentation

investigating the potential of ancestral state
SMART_READER_LITE
LIVE PREVIEW

Investigating the potential of ancestral state reconstruction - - PowerPoint PPT Presentation

Investigating the potential of ancestral state reconstruction algorithms in historical linguistics Gerhard Jger & Johann-Mattis List Tbingen University & CRLAO / Team AIRE, Paris Capturing Phylogenetic Algorithms for Linguistics,


slide-1
SLIDE 1

Investigating the potential of ancestral state reconstruction algorithms in historical linguistics

Gerhard Jäger & Johann-Mattis List

Tübingen University & CRLAO / Team AIRE, Paris

Capturing Phylogenetic Algorithms for Linguistics, Leiden

October 28, 2015

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 1 / 42

slide-2
SLIDE 2

Introduction

What is Ancestral State Reconstruction?

While tree-building methods seek to find branching diagrams which explain how a language family has evolved, ASR methods use the branching diagrams in order to explain what has evolved concretely. Ancestral state reconstruction is very common in evolutionary biology but only spuriously practiced in computational historical linguistics (Bouchard-Côté et al. 2013). In classical historical linguistics, on the other hand, linguistic reconstruction of proto-forms and proto-meanings is very common and

  • ne of the main goals of the classical comparative method (Fox 1995).

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 2 / 42

slide-3
SLIDE 3

Introduction

ASR of Lexical Replacement Patterns

If we look for words corresponding to one meaning in a wordlist and know which of the words are cognate or not, we may ask which of the word forms was the most likely candidate to be used in the proto-language of all descendant languages. This question resembles the task of “semantic reconstruction”, but in contrast to classical semantic reconstruction, we are only operating within one concept slot here, disregarding all words with a different meaning which may also be cognate with the words in our sample. As a result of this restriction, it is quite likely that we cannot recover the original form from our data. It is, however, very interesting to see to which degree we can propose a good candidate word form (cognate set) for the proto-language.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 3 / 42

slide-4
SLIDE 4

Introduction

ASR of Lexical Replacement Patterns

Kopf "head" kop "head" head "head" tête "head" testa "head" cap "head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42

slide-5
SLIDE 5

Introduction

ASR of Lexical Replacement Patterns

Kopf "head" kop "head" head "head" tête "head" testa "head" cap "head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42

slide-6
SLIDE 6

Introduction

ASR of Lexical Replacement Patterns

Kopf "head" kop "head" head "head" tête "head" testa "head" cap "head" "head"

? ? ? ? ?

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42

slide-7
SLIDE 7

Introduction

ASR of Lexical Replacement Patterns

Kopf "head" kop "head" head "head" tête "head" testa "head" cap "head" *kop "head" testa "head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42

slide-8
SLIDE 8

Introduction

ASR of Lexical Replacement Patterns

Kopf "head" kop "head" head "head" tête "head" testa "head" cap "head" *kop "head" *haubud- "head" testa "head" caput "head" *kaput- "head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42

slide-9
SLIDE 9

Introduction

This talk

reconstruction of cognate class at the root

A A B B C C ?

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 5 / 42

slide-10
SLIDE 10

Introduction

This talk

reconstruction of cognate class at the root

A A B B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 5 / 42

slide-11
SLIDE 11

Materials and Methods Materials

Data

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 6 / 42

slide-12
SLIDE 12

Materials and Methods Materials

Data

IELex 153 Indo-European doculects 207 concepts entries for Proto-Indo-European for 135 concepts → used as gold standard arbitrarily split into training set and test set: training set: 67 concepts, 1127 cognate classes (83

  • ccur in PIE)

test set: 68 concepts, 957 cognate classes (79 from PIE) ABVD 743 Austronesian doculects → 100 were selected at random 210 concepts; for 154 of them entries for Proto-Austronesian split into training set and test set: training set: 81 concepts, 1695 cognate classes (88

  • ccur in PAn)

test set: 74 concepts, 1584 cognate classes (79

  • ccur in PAn)

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 7 / 42

slide-13
SLIDE 13

Materials and Methods Methods

Prerequisites: Trees

Trees trees were inferred with full data set (training + test data) via Bayesian inference

IELex outgroup: Anatolian ABVD outgroup: Malayo-Polynesian

random samples of 1000 trees from posterior distributions maximum clade credibility trees

600.0 Kashmiri Upper_Sorbian Lahnda Old_High_German Sariqoli Stavangersk Pennsylvania_Dutch Urdu Old_Norse Polish Bulgarian Old_Swedish Portuguese_St Greek_Mod Hittite Oriya Panjabi_St Ashkun Romansh Prasun Luvian Irish_A Tocharian_A Classical_Armenian Gaulish Old_Irish Old_Gutnish Gujarati Swedish_Vl Standard_German_Munich Serbian Norwegian Latvian Wakhi Frisian Greek_Md Bulgarian_P Khaskura Czech_E Polish_P Kati Sardinian_N Digor_Ossetic French Danish Standard_Albanian Brazilian Ladin Ossetic Manx Albanian_K Magahi Marathi Sardinian_L Old_Prussian Rumanian_List Slovak_P Albanian_Top Albanian_T Waziri German Greek_D Byelorussian Oscan Hindi Vlach Vedic_Sanskrit Shughni Schwyzerduetsch Breton_List Old_Welsh Macedonian Slovenian Albanian_C Provencal Serbocroatian Breton_Se Persian Lithuanian_O Baluchi Ancient_Greek Slovak Catalan Gaelic_Scots Serbocroatian_P Czech Icelandic_St Albanian_G Gothic Lithuanian_St Dolomite_Ladino Latin Ukrainian Marwari Gypsy_Gk Avestan Swedish Welsh_N Macedonian_P Greek_K Tocharian_B Oevdalian Armenian_List Old_Breton Flemish Old_English Swedish_Up Bihari Welsh_C Sindhi Italian Bhojpuri Old_Persian Byelorussian_P Afrikaans Friulian Faroese Gutnish_Lau Tadzik Sardinian_C Old_Cornish Palaic Czech_P Ukrainian_P Irish_B Dutch_List Singhalese Russian Cornish Lower_Sorbian Assamese Russian_P Greek_Ml Nepali English Kurdish Breton_St Sogdian Letzebuergesch Spanish Danish_Fjolde Pashto Umbrian Zazaki Iron_Ossetic Old_Church_Slavonic Lycian Walloon Armenian_Mod Slovenian_P Albanian Tsakonian Bengali 0.06 FijianBau Isamorong KwaraaeSolomonIslands Cebuano LampungApiKalianda Lampung KomeringIlirPalauGemantungVillage Tagalog Ivasay EastSumbaneseUmbuRatuNggaidialect Carolinian LampungApiKrui Anakalang LampungApiBelalau LampungNyoMenggalaTulangBawang Melayu KakidugenIlongot Komering KomeringUluPerjayaVillage Kerinci TetunTerikFehandialect Surigaonon Woleai LampungApiDaya Mamboru Tabar Marquesan EastSumbaneseLewadialect Maori Tongan Tolo CiuliAtayalBandai Rarotongan BlablangaGhove LampungApiSungkai GhariTandai TahitianModern LampungNyoAbungKotabumi Tuamotu Babuyan Rurutuan MalayBahasaIndonesia Saa Imorod PaiwanKulalao Niue KomeringKayuAgungAsli Blablanga FutunaEast TaliseMalagheti Ogan Indonesian MaringeKmagha Toambaita Itbayat LampungApiTalangPadang KilokakaYsabel Yami ManoboAtaupriver DayakNgaju Masiwang Luangiua LampungApiJabung Lau KomeringUluAdumanisVillage Tikopia NakanaiBilekiDialect Neveei Sengga Iraralay ManoboAtadownriver Itbayaten LampungApiPubian Pukapuka Talise SquliqAtayal TannaSouthwest LampungNyoAbungSukadana KomeringUluDamarpuraVillage Hawaiian Katingan LampungApiSukau WesternBukidnonManobo Chuukese TagalogAnthonydelaPaz LampungApiWayKanan Samoan EastSumbaneseKamberaSoutherndialect Kokota Lakalai LampungApiKotaAgung Penrhyn BabatanaKatazi Sikaiana GhariNggeri Kambera Luqa LampungApiRanau Rennellese Kubokota

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 8 / 42

slide-14
SLIDE 14

Materials and Methods Methods

Phylogenetic uncertainty

proper way to deal with it: work with posterior sample rather than with a single tree poor man’s method:

remove all short branches (shorter than some threshold) do ASR with resulting multifurcating tree

Prasun Ashkun Kati Sogdian Ossetic Digor_Ossetic Iron_Ossetic Pashto Waziri Baluchi Kurdish Zazaki T adzik Persian Wakhi Shughni Sariqoli Old_Persian Avestan Vedic_Sanskrit Kashmiri Nepali Khaskura Bengali Assamese Oriya Bihari Gujarati Marathi Sindhi Marwari Hindi Urdu Lahnda Panjabi_St Bhojpuri Magahi Gypsy_Gk Singhalese Old_Prussian Latvian Lithuanian_O Lithuanian_St Old_Church_Slavonic Serbocroatian Serbian Serbocroatian_P Bulgarian_P Bulgarian Macedonian Macedonian_P Slovenian Slovenian_P Russian Russian_P Ukrainian_P Byelorussian_P Byelorussian Polish Ukrainian Polish_P Upper_Sorbian Lower_Sorbian Czech Slovak Czech_E Slovak_P Czech_P Gothic German Standard_German_Munich Pennsylvania_Dutch Schwyzerduetsch Letzebuergesch Frisian Afrikaans Flemish Dutch_List Old_High_German Old_English English Old_Gutnish Stavangersk Norwegian Danish Danish_Fjolde Gutnish_Lau Oevdalian Swedish Swedish_Up Swedish_Vl Old_Swedish Faroese Old_Norse Icelandic_St Old_Breton Old_Cornish Old_Welsh Welsh_C Welsh_N Cornish Breton_St Breton_Se Breton_List Gaulish Old_Irish Irish_A Irish_B Gaelic_Scots Manx Oscan Umbrian Vlach Rumanian_List Dolomite_Ladino Romansh Ladin Friulian Italian Walloon French Provencal Catalan Brazilian Portuguese_St Spanish Sardinian_L Sardinian_C Sardinian_N Latin T
  • charian_A
T
  • charian_B
Albanian_T Standard_Albanian Albanian Albanian_G Albanian_T
  • p
Albanian_K Albanian_C Ancient_Greek Greek_Mod Greek_Md Greek_Ml Greek_D T sakonian Greek_K Classical_Armenian Armenian_Mod Armenian_List Lycian Luvian Palaic Hittite 100.0

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 9 / 42

slide-15
SLIDE 15

Materials and Methods Methods

Coding

Multi-state

A A B B C C B

Binarized

A A non-A non-A non-A non-A non-A B B B non-B non-B non-B non-B C non-C C non-C non-C non-C non-C

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 10 / 42

slide-16
SLIDE 16

Materials and Methods Methods

Polymorphisms (a.k.a. synonyms)

Kopf "head" kop "head" head "head" tête "head" testa "head" cap "head" Haupt "head" hoofd "head"

problem for multistate coding possible representations:

epistemic: both

  • bservations have 50%

(subjective) probability lifted model: states in the technical sense are sets of cognate classes

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 11 / 42

slide-17
SLIDE 17

Materials and Methods Methods

Parsimony reconstruction

A C C B A B B A B B C

Parsimony = 2

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42

slide-18
SLIDE 18

Materials and Methods Methods

Parsimony reconstruction

A C C A B B A B C

Parsimony = 3

A A

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42

slide-19
SLIDE 19

Materials and Methods Methods

Parsimony reconstruction

A C C A B B A C

Parsimony = 3

A C C

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42

slide-20
SLIDE 20

Materials and Methods Methods

Weighted parsimony reconstruction

A C C B A B B A B B C Weighted Parsimony = 3

Weight matrix A B C A 1 2 B 1 2 C 2 2

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42

slide-21
SLIDE 21

Materials and Methods Methods

Weighted parsimony reconstruction

A C C A B B A B C A A Weighted Parsimony = 4

Weight matrix A B C A 1 2 B 1 2 C 2 2

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42

slide-22
SLIDE 22

Materials and Methods Methods

Weighted parsimony reconstruction

A C C A B B A C Weighted Parsimony = 5 A C C

Weight matrix A B C A 1 2 B 1 2 C 2 2

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42

slide-23
SLIDE 23

Materials and Methods Methods

Dynamic Programming (Sankoff Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42

slide-24
SLIDE 24

Materials and Methods Methods

Dynamic Programming (Sankoff Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42

slide-25
SLIDE 25

Materials and Methods Methods

Dynamic Programming (Sankoff Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42

slide-26
SLIDE 26

Materials and Methods Methods

Dynamic Programming (Sankoff Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42

slide-27
SLIDE 27

Materials and Methods Methods

Weighted Parsimony reconstruction

the state with the lowest parsimony score wins in case of ties, frequency at the leafs is tie-breaker binary characters:

w(0 → 2) = 1; w(1 → 0) = 2

multi-state characters:

all weights = 1 polymorphism only admitted at tips: w(a → {a, b}) = w(a → {b, c}) = 1

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 15 / 42

slide-28
SLIDE 28

Materials and Methods Methods

The MLN Method for ASR

The MLN method (List et al. 2014a) uses parsimony for ancestral state reconstruction. In contrast to classical parsimony, MLN tests different weighting schemes for gains and losses and selects the optimal scheme with help

  • f the vocabulary size criterion.

The vocabulary size criterion states that the amount of synonyms per word should be similar in the ancestral and the descendant languages.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 16 / 42

slide-29
SLIDE 29

Materials and Methods Methods

The MLN Method for ASR

Too many synonyms in ancestral nodes!

The vocabulary size criterion states that the amount of synonyms per word (here reflected by the size of the nodes in the tree) should be similar across ancestral and descendant languages. With help of this criterion, an optimal weighting scheme for gain-loss rates is chosen for individual datasets.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42

slide-30
SLIDE 30

Materials and Methods Methods

The MLN Method for ASR

Too few synonyms in ancestral nodes!

The vocabulary size criterion states that the amount of synonyms per word (here reflected by the size of the nodes in the tree) should be similar across ancestral and descendant languages. With help of this criterion, an optimal weighting scheme for gain-loss rates is chosen for individual datasets.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42

slide-31
SLIDE 31

Materials and Methods Methods

The MLN Method for ASR

Optimal amount of synonyms in ancestral nodes!

The vocabulary size criterion states that the amount of synonyms per word (here reflected by the size of the nodes in the tree) should be similar across ancestral and descendant languages. With help of this criterion, an optimal weighting scheme for gain-loss rates is chosen for individual datasets.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42

slide-32
SLIDE 32

Materials and Methods Methods

Reconstruction on a posterior sample

if a sample of trees is used: A state is reconstructed if it is reconstructed in more than θ trees in the sample. θ is estimated using the training set. values: database method θ IELex Sankoff/binary 0.690 Sankoff/multistate 0.056 MLN 0.464

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 18 / 42

slide-33
SLIDE 33

Materials and Methods Methods

Likelihood-based reconstruction

log L(tips below|mother = s) =

  • d∈daughters
  • s′∈states log P(s → s′|branchlength)+

log(L(tips below d|d = s′))

A C C A B B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 19 / 42

slide-34
SLIDE 34

Materials and Methods Methods

Likelihood-based reconstruction

note: likelihoods (unlike parsimony scores) depend on branch lengths! likelihoods at the root give likelihood of a reconstruction, given all

  • bserved data (for that character)

total likelihood is obtained by multiplying root state likelihoods with equilibrium probabilities given a rate matrix rate matrix is optimized to maximize likelihood

rates across characters are independently optimized for multistate characters, all rates are constrained to be equal (otherwise BayesTraits crashes…)

using equilibrium probabilities, you can derive exptected state probabilities for root states a state is likelihood-reconstructed if its expected probability > θ2 again, threshold θ2 must be estimated from training set

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 20 / 42

slide-35
SLIDE 35

Results General Results

Evaluation

0.0 0.2 0.4 0.6 0.8 precision recall F.score database ABVD IELex 0.0 0.2 0.4 0.6 0.8 precision recall F .score algorithm ML MLN Sankoff 0.0 0.2 0.4 0.6 0.8 precision recall F .score character type binary valued multi−valued 0.0 0.2 0.4 0.6 0.8 precision recall F.score tree type bifurcating multifurcating 0.0 0.2 0.4 0.6 0.8 precision recall F .score tree sample posterior sample summary tree

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 21 / 42

slide-36
SLIDE 36

Results General Results

Evaluation

IELex algorithm characters furcating treeSample precision recall F-score ML binary bifurcating summary tree 0.817 0.734 0.773 ML binary bifurcating posterior sample 0.795 0.734 0.763 ML binary multifurcating summary tree 0.792 0.722 0.755 ML binary multifurcating posterior sample 0.756 0.747 0.752 Sankoff binary multifurcating summary tree 0.716 0.734 0.725 Sankoff binary bifurcating summary tree 0.704 0.722 0.712 Sankoff binary multifurcating posterior sample 0.720 0.684 0.701 Sankoff binary bifurcating posterior sample 0.72 0.684 0.701 ML multi bifurcating posterior sample 0.642 0.772 0.701 MLN multi bifurcating posterior sample 0.743 0.658 0.698 MLN binary multifurcating posterior sample 0.743 0.658 0.698 MLN binary bifurcating posterior sample 0.743 0.658 0.698 Sankoff multi bifurcating summary tree 0.671 0.722 0.695 Sankoff multi multifurcating posterior sample 0.671 0.722 0.695 Sankoff multi bifurcating posterior sample 0.671 0.722 0.695 ML multi multifurcating posterior sample 0.629 0.772 0.693 MLN multi multifurcating posterior sample 0.758 0.633 0.690 Sankoff multi multifurcating summary tree 0.735 0.633 0.680 ML multi multifurcating summary tree 0.735 0.633 0.680 ML multi bifurcating summary tree 0.721 0.620 0.667 MLN multi multifurcating summary tree 0.584 0.658 0.619 MLN binary multifurcating summary tree 0.584 0.658 0.619 MLN multi bifurcating summary tree 0.742 0.291 0.418 MLN binary bifurcating summary tree 0.742 0.291 0.418 Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 22 / 42

slide-37
SLIDE 37

Results General Results

Evaluation

ABVD algorithm characters furcating treeSample precision recall F-score ML multi bifurcating posterior sample 0.738 0.747 0.742 ML binary bifurcating posterior sample 0.682 0.759 0.719 ML multi bifurcating summary tree 0.740 0.684 0.711 ML binary bifurcating summary tree 0.757 0.681 0.711 Sankoff multi bifurcating summary tree 0.691 0.709 0.700 Sankoff binary multifurcating posterior sample 0.781 0.633 0.699 ML binary multifurcating posterior sample 0.761 0.646 0.699 ML multi multifurcating summary tree 0.726 0.671 0.697 Sankoff binary bifurcating posterior sample 0.726 0.671 0.697 ML binary multifurcating summary tree 0.732 0.658 0.693 Sankoff multi multifurcating summary tree 0.679 0.696 0.688 MLN multi bifurcating summary tree 0.655 0.722 0.687 MLN binary bifurcating summary tree 0.655 0.722 0.687 Sankoff binary bifurcating summary tree 0.629 0.557 0.591 Sankoff multi multifurcating posterior sample 0.542 0.570 0.556 Sankoff multi bifurcating posterior sample 0.542 0.570 0.556 MLN multi multifurcating posterior sample 0.414 0.848 0.556 MLN multi bifurcating posterior sample 0.414 0.848 0.556 MLN binary multifurcating posterior sample 0.414 0.848 0.556 MLN binary bifurcating posterior sample 0.414 0.848 0.556 ML multi multifurcating posterior sample 0.421 0.709 0.528 Sankoff binary multifurcating summary tree 0.469 0.570 0.514 MLN multi multifurcating summary tree 0.667 0.405 0.504 MLN binary multifurcating summary tree 0.667 0.405 0.504 Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 23 / 42

slide-38
SLIDE 38

Results Specific Results

Summary on Indo-European ASR

Error Type GS ASR Number Missing forms A Ø 7 Different forms A B 9 Additional forms in ASR A A, B 5 Missing root in ASR A, B A 4 Summary 25

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 24 / 42

slide-39
SLIDE 39

Results Specific Results

Evaluating the Differences

We evaluate the differences qualitatively by checking the reflection of the proposed root in the branches, especially with semantically shifted word forms which may not occur in the wordlist data, using standard sources like Meier-Brügger (2002), Wodtko et al. (2008), Rix et al. (2002), and Pokorny (1959) for Indo-European in general, and specific sources like Vaan (2008) for Latin, Derksen (2008) and Vasmer (1986/1987) for Slavic, and Kroonen (2013) for Germanic. the likelihood of semantic shift of the given root with help of the Database of Cross-Linguistic Colexifications (CLICS, List et al. 2013 and 2014b, http://clics.lingpy.org), whether the cognate sets in the data are really reflexes of the proposed PIE root. Based on this check, we distinguish four grades of root quality: erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 25 / 42

slide-40
SLIDE 40

Results Specific Results

Indo-European ASR: Missing forms

Concept Form Meaning in Reflexes Comment SEE *derḱ-

to see

Only reflected in Indo-Iranian, cognates also problematic.

SEE *weid-

to see

  • r

to know

Safe root for Indo-European.

SING *kan-

to sing or the rooster

Root is proposed for PIE on the basis of Germanic reflexes meaning “rooster” which is a highly unlikely semantic change

SMELL *h₃ed-

to smell

Potential root for PIE, but only reflected in Greek and Romance

SMALL *mei-

small

Wrong cognate judgments in the database, since neither Russian malenkij nor English small go back to this root

THINK *teng-

to think or to feel

Root only reflected in Germanic languages with spurious reflexes in seman- tically shifted form in other branches. A better candidate for PIE would be *men- “the mind or to think”.

WASH *leh₂w-

to wash or to pour

Wrong cognate assignment in the source since Romance and Albanian re- flexes are not annotated.

WASH *neigʷ-

to wash or water monster

Very unlikely cognate assignment, due to the extreme shift from “to wash” to “water monster” (cf. English nix) in the Germanic languages.

WET *wed-

water or wet

Semantic change from “water” to “wet” is likely according to CLICS, but it is not clear why this should have already happened in PIE times.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 26 / 42

slide-41
SLIDE 41

Results Specific Results

Indo-European ASR: Missing forms

Concept Form Meaning in Reflexes Comment SEE *derḱ-

to see

Only reflected in Indo-Iranian, cognates also problematic.

SEE *weid-

to see

  • r

to know

Safe root for Indo-European.

SING *kan-

to sing or the rooster

Root is proposed for PIE on the basis of Germanic reflexes meaning “rooster” which is a highly unlikely semantic change

SMELL *h₃ed-

to smell

Potential root for PIE, but only reflected in Greek and Romance

SMALL *mei-

small

Wrong cognate judgments in the database, since neither Russian malenkij nor English small go back to this root

THINK *teng-

to think or to feel

Root only reflected in Germanic languages with spurious reflexes in seman- tically shifted form in other branches. A better candidate for PIE would be *men- “the mind or to think”.

WASH *leh₂w-

to wash or to pour

Wrong cognate assignment in the source since Romance and Albanian re- flexes are not annotated.

WASH *neigʷ-

to wash or water monster

Very unlikely PIE root, due to the extreme shift from “to wash” to “water monster” (cf. English nix) in the Germanic languages.

WET *wed-

water or wet

Semantic change from “water” to “wet” is likely according to CLICS, but it is not clear why this should have already happened in PIE times.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 26 / 42

slide-42
SLIDE 42

Results Specific Results

Indo-European ASR: Missing Forms in ASR

Concept Form in GS Comment NOT *meh₁

This form is reflected in Old Greek as a prohibitive negation and also re- constructed as such. Whether it was the normal negation in PIE is less clear.

SLEEP *drem

This form is mainly reflected in Latin and spuriously in Indian and Greek. It is much more likely that it meant something else in PIE and then shifted into this meaning.

VOMIT *h₁rewg-

No need to reconstruct this form back to PIE, since it is only reflected in two languages of Romance.

YEAR *ieHr-

This form has only reflexes in Germanic languages. Generally, the meaning “year” is difficult to reconstruct, due to the high potential for shift from “summer”, “winter”, “time”, etc. as shown in CLICS.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 27 / 42

slide-43
SLIDE 43

Results Specific Results

Indo-European ASR: Missing Forms in ASR

Concept Form in GS Comment NOT *meh₁

This form is reflected in Old Greek as a prohibitive negation and also re- constructed as such. Whether it was the normal negation in PIE is less clear.

SLEEP *drem

This form is mainly reflected in Latin and spuriously in Indian and Greek. It is much more likely that it meant something else in PIE and then shifted into this meaning.

VOMIT *h₁rewg-

No need to reconstruct this form back to PIE, since it is only reflected in two languages of Romance.

YEAR *ieHr-

This form has only reflexes in Germanic languages. Generally, the meaning “year” is difficult to reconstruct, due to the high potential for shift from “summer”, “winter”, “time”, etc. as shown in CLICS.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 27 / 42

slide-44
SLIDE 44

Results Specific Results

Indo-European ASR: Different Forms

Concept GS ASR Comment RIVER *h₂ekʷeh₂ *h₂ep-

Form in GS meant “water” in PIE. Although a shift from “water” to “river” is likely according to CLICS, this meaning is an innovation in Germanic. The ASR form is reflected across multiple branches and a much better candidate.

RUB *melh₁- *terh₁-

Form in GS is not reflected in the standard literature (LIV and LIN), form in ASR is reflected in the meaning “to rub, to bore”.

SCRATCH *gerbʰ- *kes-

Form in GS is only reflected in few Germanic languages, probably with a wrong cognate assignment. Following Derksen (2008), assuming the GSR form is a much better candidate for the PIE word for “scratch”.

SKIN *pel *(s)kewH-

Form in GS is a good PIE root, but not necessarily with the meaning “skin”, as the meaning of the reflexes differs greatly. The GSR form derives from a PIE verb meaning “to cover”, but the cognate should not contain Slavic words (Derksen 2008).

WALK *ǵʰeh₁ *h₁ei-

The GS form is only reflected in Germanic. The ASR form is a clear PIE root, but the meaning may also have been “to go”.

WATER *h₂ekʷeh₂ *wódr̥

The ASR form is a much better candidate for “water” in PIE, due to its high number

  • f reflexes in all branches.

WHITE *h₂elbʰós *h₂erǵó-

The GS form is only reflected in Romance in this meaning and as meaning “cloud” in Hittite. The ASR form is a much better candidate, with a much more plausible connection between reflexes meaning “shine” and “white”, as also confirmed by CLICS.

WORM *wr̥mi- *kʷr̥mis

The ASR form is reflected in more different branches of PIE, while the GS form is only reflected in Germanic and Romance.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 28 / 42

slide-45
SLIDE 45

Results Specific Results

Indo-European ASR: Different Forms

Concept GS ASR Comment RIVER *h₂ekʷeh₂ *h₂ep-

Form in GS meant “water” in PIE. Although a shift from “water” to “river” is likely according to CLICS, this meaning is an innovation in Germanic. The ASR form is reflected across multiple branches and a much better candidate.

RUB *melh₁- *terh₁-

Form in GS is not reflected in the standard literature (LIV and LIN), form in ASR is reflected in the meaning “to rub, to bore”.

SCRATCH *gerbʰ- *kes-

Form in GS is only reflected in few Germanic languages, probably with a wrong cognate assignment. Following Derksen (2008), assuming the GSR form is a much better candidate for the PIE word for “scratch”.

SKIN *pel *(s)kewH-

Form in GS is a good PIE root, but not necessarily with the meaning “skin”, as the meaning of the reflexes differs greatly. The GSR form derives from a PIE verb meaning “to cover”, but the cognate should not contain Slavic words (Derksen 2008).

WALK *ǵʰeh₁ *h₁ei-

The GS form is only reflected in Germanic. The ASR form is a clear PIE root, but the meaning may also have been “to go”.

WATER *h₂ekʷeh₂ *wódr̥

The ASR form is a much better candidate for “water” in PIE, due to its high number

  • f reflexes in all branches.

WHITE *h₂elbʰós *h₂erǵó-

The GS form is only reflected in Romance in this meaning and as meaning “cloud” in Hittite. The ASR form is a much better candidate, with a much more plausible connection between reflexes meaning “shine” and “white”, as also confirmed by CLICS.

WORM *wr̥mi- *kʷr̥mis

The ASR form is reflected in more different branches of PIE, while the GS form is only reflected in Germanic and Romance.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 28 / 42

slide-46
SLIDE 46

Results Specific Results

Indo-European ASR: Additional Forms

Concept Form in ASR Comment MOON *lewk-s-nh₂

This form would go back to a PIE root meaning “to shine” and is often said to have independently turned to mean “moon” in Romance and Slavic and

  • ther branches. The shift from “shine” to “moon” is however not very likely

(no evidence in CLICS), so it is also possible that the word meant already “moon” in PIE as an epithet (Vaan 2008).

SNOW *ǵʰéi-mn̥-

The form has probably independently shifted from the original meaning “frost, cold”, which is a very likely shift according to CLICS.

SUCK *suḱ-

The root is present in this meaning in many subbranches and a good can- didate for PIE in this meaning.

THIS *so / *to

The root is a clear PIE demonstrative (Meier-Brügger 2010), but the reflexes in the daughter languages vary greatly, due to analogical levelling.

WITH *sm ̥

A very good candidate for the meaning with reflexes in Greek, Indo-Iranian and Slavic.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 29 / 42

slide-47
SLIDE 47

Results Specific Results

Indo-European ASR: Additional Forms

Concept Form in ASR Comment MOON *lewk-s-nh₂

This form would go back to a PIE root meaning “to shine” and is often said to have independently turned to mean “moon” in Romance and Slavic and

  • ther branches. The shift from “shine” to “moon” is however not very likely

(no evidence in CLICS), so it is also possible that the word meant already “moon” in PIE as an epithet (Vaan 2008).

SNOW *ǵʰéi-mn̥-

The form has probably independently shifted from the original meaning “frost, cold”, which is a very likely shift according to CLICS.

SUCK *suḱ-

The root is present in this meaning in many subbranches and a good can- didate for PIE in this meaning.

THIS *so / *to

The root is a clear PIE demonstrative (Meier-Brügger 2010), but the reflexes in the daughter languages vary greatly, due to analogical levelling.

WITH *sm ̥

A very good candidate for the meaning with reflexes in Greek, Indo-Iranian and Slavic.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 29 / 42

slide-48
SLIDE 48

Results Specific Results

Evaluation against our manually created gold standard

precision: 0.986 (1 false positive) recall: 0.895 (8 false negatives) F-score: 0.9381

1The IELex PIE entries have an F-score of 0.854. Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 30 / 42

slide-49
SLIDE 49

Results Specific Results

False positive

Sogdian Ossetic Digor Ossetic Iron Ossetic Wakhi Shughni Sariqoli Baluchi Zazaki Tadzik Persian Pashto Waziri Avestan Vedic Sanskrit Kashmiri Marathi Nepali Khaskura Gypsy Gk Singhalese Old Prussian Latvian Lithuanian O Lithuanian St Bulgarian P Bulgarian Macedonian Macedonian P Serbocroatian Serbian Serbocroatian P Slovenian Slovenian P Russian Russian P Ukrainian P Polish Ukrainian Byelorussian Byelorussian P Slovak Czech E Czech Slovak P Czech P Polish P Upper Sorbian Lower Sorbian Old Church Slavonic Cornish Breton Se Breton List Breton St Welsh C Welsh N Old Irish Irish A Irish B Gaelic Scots Vlach Dolomite Ladino Romansh Ladin Friulian Italian Walloon French Provencal Catalan Brazilian Portuguese St Spanish Sardinian L Sardinian C Latin Gothic Afrikaans Flemish Dutch List Frisian German Standard German Munich Schwyzerduetsch Letzebuergesch Pennsylvania Dutch Old High German Old English English Old Norse Icelandic St Faroese Old Swedish Stavangersk Norwegian Danish Danish Fjolde Gutnish Lau Oevdalian Swedish Swedish Up Swedish Vl Albanian T Albanian Albanian G Standard Albanian Albanian Top Albanian K Albanian C Ancient Greek Greek Ml Greek D Greek Md Greek Mod Greek K Classical Armenian Armenian Mod Armenian List

  • snow:D

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 31 / 42

slide-50
SLIDE 50

Results Specific Results

False negatives

Kati Sogdian Ossetic Digor Ossetic Iron Ossetic Zazaki Tadzik Persian Pashto Old Persian Avestan Vedic Sanskrit Hindi Panjabi St Sindhi Marwari Gujarati Marathi Assamese Oriya Bengali Nepali Khaskura Singhalese Old Prussian Latvian Lithuanian O Lithuanian St Bulgarian P Bulgarian Macedonian Macedonian P Serbocroatian Serbian Serbocroatian P Slovenian P Russian Russian P Ukrainian P Polish Ukrainian Byelorussian Byelorussian P Slovak Czech Slovak P Czech P Polish P Upper Sorbian Lower Sorbian Old Church Slavonic Cornish Breton Se Breton List Breton St Welsh C Welsh N Gaulish Old Irish Irish A Irish B Gaelic Scots Vlach Rumanian List Dolomite Ladino Romansh Ladin Friulian Italian Walloon French Provencal Catalan Brazilian Portuguese St Spanish Sardinian L Sardinian C Sardinian N Latin Gothic Flemish Frisian German Standard German Munich Schwyzerduetsch Letzebuergesch Old High German Old English Old Norse Icelandic St Faroese Old Swedish Stavangersk Norwegian Danish Danish Fjolde Gutnish Lau Oevdalian Swedish Swedish Up Swedish Vl Albanian T Albanian Albanian G Standard Albanian Albanian Top Albanian K Albanian C Ancient Greek Greek Ml Greek D Greek Md Greek Mod Greek K Classical Armenian Armenian Mod Armenian List Luvian Hittite

  • river:O

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 32 / 42

slide-51
SLIDE 51

Results Specific Results

False negatives

Digor Ossetic Iron Ossetic Shughni Baluchi Zazaki Tadzik Persian Pashto Vedic Sanskrit Hindi Lahnda Panjabi St Urdu Sindhi Gujarati Marathi Assamese Oriya Bengali Bihari Nepali Khaskura Gypsy Gk Old Prussian Latvian Lithuanian St Bulgarian Macedonian Macedonian P Serbocroatian Serbian Serbocroatian P Slovenian Slovenian P Russian P Ukrainian P Polish Ukrainian Byelorussian Byelorussian P Slovak Czech E Czech Slovak P Czech P Polish P Upper Sorbian Lower Sorbian Old Church Slavonic Cornish Breton Se Breton List Breton St Welsh C Welsh N Old Irish Irish A Gaelic Scots Rumanian List Dolomite Ladino Romansh Italian Walloon French Provencal Catalan Brazilian Portuguese St Spanish Sardinian C Latin Afrikaans Flemish Dutch List Frisian German Standard German Munich Letzebuergesch Old High German Old English Old Norse Icelandic St Faroese Old Swedish Stavangersk Norwegian Danish Danish Fjolde Gutnish Lau Oevdalian Swedish Swedish Up Swedish Vl Tocharian A Tocharian B Albanian T Albanian Albanian Top Albanian K Ancient Greek Greek Ml Greek D Greek Md Greek Mod Greek K Classical Armenian Armenian Mod Armenian List

  • smell:W

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 33 / 42

slide-52
SLIDE 52

Results Specific Results

False negatives

Kati Sogdian Ossetic Digor Ossetic Iron Ossetic Wakhi Shughni Baluchi Tadzik Persian Pashto Waziri Avestan Vedic Sanskrit Kashmiri Hindi Sindhi Marwari Gujarati Marathi Assamese Oriya Bengali Bihari Gypsy Gk Singhalese Latvian Lithuanian O Lithuanian St Bulgarian P Bulgarian Macedonian P Serbocroatian Serbian Serbocroatian P Slovenian Slovenian P Russian Russian P Ukrainian P Polish Ukrainian Byelorussian Byelorussian P Slovak Czech E Czech Slovak P Czech P Polish P Upper Sorbian Lower Sorbian Old Church Slavonic Cornish Breton Se Breton List Breton St Welsh C Welsh N Old Irish Irish A Irish B Gaelic Scots Vlach Rumanian List Dolomite Ladino Romansh Ladin Friulian Italian Walloon French Provencal Catalan Brazilian Portuguese St Spanish Sardinian L Sardinian C Sardinian N Latin Gothic Afrikaans Flemish Dutch List Frisian German Standard German Munich Schwyzerduetsch Letzebuergesch Pennsylvania Dutch Old High German Old English English Old Norse Icelandic St Faroese Old Swedish Stavangersk Norwegian Danish Danish Fjolde Gutnish Lau Oevdalian Swedish Swedish Up Swedish Vl Albanian T Albanian Albanian G Standard Albanian Albanian Top Albanian K Albanian C Ancient Greek Greek Ml Greek D Greek Md Greek Mod Greek K Classical Armenian Armenian Mod Armenian List

  • wet:I

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 34 / 42

slide-53
SLIDE 53

Results Specific Results

False negatives

Prasun Ashkun Kati Sogdian Ossetic Digor Ossetic Iron Ossetic Wakhi Baluchi Kurdish Tadzik Persian Pashto Waziri Avestan Vedic Sanskrit Kashmiri Hindi Lahnda Urdu Marwari Gujarati Marathi Assamese Oriya Bengali Bihari Nepali Khaskura Latvian Lithuanian O Lithuanian St Bulgarian P Bulgarian Macedonian Macedonian P Serbocroatian Serbian Serbocroatian P Slovenian Slovenian P Russian Russian P Ukrainian P Polish Ukrainian Byelorussian Byelorussian P Slovak Czech E Czech Slovak P Czech P Polish P Upper Sorbian Lower Sorbian Old Church Slavonic Old Breton Old Cornish Old Welsh Cornish Breton Se Breton List Breton St Welsh C Welsh N Old Irish Irish A Irish B Gaelic Scots Manx Rumanian List Dolomite Ladino Romansh Ladin Friulian Italian Walloon French Provencal Catalan Brazilian Portuguese St Spanish Sardinian L Sardinian C Sardinian N Latin Gothic Afrikaans Flemish Dutch List Frisian German Standard German Munich Schwyzerduetsch Letzebuergesch Pennsylvania Dutch Old High German Old English Old Norse Icelandic St Faroese Old Swedish Stavangersk Norwegian Danish Danish Fjolde Gutnish Lau Oevdalian Swedish Swedish Up Swedish Vl Tocharian A Tocharian B Albanian T Albanian Albanian G Standard Albanian Albanian Top Albanian K Albanian C Ancient Greek Greek Ml Greek D Greek Md Tsakonian Greek Mod Greek K Classical Armenian Armenian List

  • skin:B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 35 / 42

slide-54
SLIDE 54

Results Specific Results

False negatives

Kati Sogdian Ossetic Digor Ossetic Iron Ossetic Wakhi Shughni Sariqoli Baluchi Kurdish Zazaki Tadzik Persian Pashto Waziri Avestan Vedic Sanskrit Kashmiri Hindi Lahnda Panjabi St Urdu Bhojpuri Sindhi Marwari Gujarati Marathi Assamese Oriya Bengali Bihari Nepali Khaskura Singhalese Old Prussian Latvian Lithuanian O Lithuanian St Bulgarian P Bulgarian Macedonian Macedonian P Serbocroatian Serbocroatian P Slovenian Slovenian P Russian Russian P Ukrainian P Polish Ukrainian Byelorussian Byelorussian P Slovak Czech E Czech Slovak P Czech P Polish P Upper Sorbian Lower Sorbian Old Church Slavonic Cornish Breton Se Breton List Breton St Welsh C Welsh N Old Irish Irish A Irish B Gaelic Scots Manx Vlach Rumanian List Dolomite Ladino Romansh Ladin Friulian Italian Walloon French Provencal Catalan Brazilian Portuguese St Spanish Sardinian L Sardinian C Sardinian N Latin Gothic Afrikaans Flemish Dutch List Frisian German Standard German Munich Schwyzerduetsch Letzebuergesch Pennsylvania Dutch Old High German Old English English Old Gutnish Old Norse Icelandic St Faroese Old Swedish Stavangersk Norwegian Danish Danish Fjolde Gutnish Lau Oevdalian Swedish Swedish Up Swedish Vl Tocharian A Tocharian B Albanian T Albanian Albanian G Standard Albanian Albanian Top Albanian K Albanian C Ancient Greek Greek Ml Greek D Greek Md Tsakonian Greek Mod Greek K Armenian Mod Armenian List Hittite

  • sleep:E

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 36 / 42

slide-55
SLIDE 55

Results Specific Results

False negatives

Prasun Ashkun Kati Sogdian Ossetic Digor Ossetic Iron Ossetic Sariqoli Baluchi Kurdish Zazaki Tadzik Persian Pashto Waziri Avestan Vedic Sanskrit Kashmiri Hindi Lahnda Panjabi St Marwari Gujarati Marathi Oriya Bihari Nepali Khaskura Gypsy Gk Singhalese Latvian Lithuanian O Lithuanian St Bulgarian P Bulgarian Macedonian Macedonian P Serbocroatian Serbian Serbocroatian P Slovenian P Russian Russian P Ukrainian P Polish Ukrainian Byelorussian Byelorussian P Slovak Czech E Czech Slovak P Czech P Polish P Upper Sorbian Lower Sorbian Old Church Slavonic Old Breton Old Cornish Old Welsh Cornish Breton Se Breton List Breton St Welsh C Welsh N Gaulish Old Irish Irish A Irish B Gaelic Scots Manx Vlach Rumanian List Dolomite Ladino Romansh Ladin Friulian Italian Walloon French Provencal Catalan Brazilian Portuguese St Spanish Sardinian L Sardinian C Sardinian N Latin Gothic Afrikaans Flemish Dutch List Frisian German Standard German Munich Schwyzerduetsch Letzebuergesch Pennsylvania Dutch Old High German Old English English Old Gutnish Old Norse Icelandic St Faroese Old Swedish Stavangersk Norwegian Danish Danish Fjolde Gutnish Lau Oevdalian Swedish Swedish Up Swedish Vl Tocharian A Tocharian B Albanian T Albanian Albanian G Standard Albanian Albanian Top Albanian K Albanian C Ancient Greek Greek Ml Greek D Greek Md Tsakonian Greek Mod Greek K Armenian List Hittite

  • white:E

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 37 / 42

slide-56
SLIDE 56

Results Specific Results

False negatives

Sogdian Digor Ossetic Iron Ossetic Wakhi Sariqoli Baluchi Zazaki Tadzik Persian Pashto Waziri Vedic Sanskrit Kashmiri Hindi Lahnda Panjabi St Urdu Magahi Sindhi Gujarati Marathi Assamese Oriya Bengali Nepali Singhalese Old Prussian Latvian Lithuanian O Lithuanian St Bulgarian P Bulgarian Macedonian Macedonian P Serbocroatian Serbian Serbocroatian P Slovenian P Russian Russian P Ukrainian P Polish Ukrainian Byelorussian Byelorussian P Slovak Czech E Czech Slovak P Czech P Polish P Upper Sorbian Lower Sorbian Old Church Slavonic Cornish Breton Se Breton List Breton St Welsh N Old Irish Irish B Gaelic Scots Vlach Rumanian List Dolomite Ladino Ladin Friulian Italian Walloon French Provencal Brazilian Portuguese St Spanish Sardinian L Sardinian C Sardinian N Latin Gothic Afrikaans Flemish Dutch List Frisian German Standard German Munich Schwyzerduetsch Letzebuergesch Pennsylvania Dutch Old High German Old English English Old Norse Icelandic St Faroese Old Swedish Stavangersk Norwegian Danish Danish Fjolde Gutnish Lau Oevdalian Swedish Swedish Up Swedish Vl Tocharian A Tocharian B Albanian T Albanian Albanian G Standard Albanian Albanian Top Albanian K Albanian C Greek Ml Greek D Greek Md Greek Mod Greek K Classical Armenian Armenian Mod Armenian List

  • worm:B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 38 / 42

slide-57
SLIDE 57

Results Specific Results

Summary on Indo-European

As the qualitative evaluation shows, the proto-forms proposed to be reconstructed back to PIE by our best ASR method are mostly equally good if not even better candidates than those which we found in the gold

  • standard. Given the general and well-known uncertainties in semantic

reconstruction in classical historical linguistics, it seems that ASR methods could provide actual help in semantic reconstruction by providing objective evolutionary scenarios for word evolution along a given tree which follow a specific evolutionary model.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 39 / 42

slide-58
SLIDE 58

Discussion

Benefits of ASR (?)

If the language family is well-known ASR is of limited use in semantic reconstruction, since independent reconstructions by the comparative methods are available, but it is quite useful to check data quality and reference tree topology in lexicostatistical datasets. If the language family is less well-known ASR is definitely useful as a preliminary analysis for semantic reconstruction, since it gives a more objective assessment of the consequences of a given theory of lexical replacement and external language change (a tree topology).

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 40 / 42

slide-59
SLIDE 59

Discussion

Benefits of ASR (!)

ASR may help

1

to identify loci of homoplasy and gives thus a first hint for parallel semantic change patterns and borrowing.

2

to quantify differential rates of lexical replacements for the concepts in a given wordlist.

3

to automatically identify sound change patterns and proto-form reconstructions.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 41 / 42

slide-60
SLIDE 60

Discussion

Caveats

Our current models are still very simplistic, in so far as they

  • perate independently for each meaning slot,

handle only binary (yes-no) cognate relations between words. Future research will show whether it is possible to model lexical change across meanings and to allow for more fine-grained relations between cognate classes.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 42 / 42

slide-61
SLIDE 61

References

  • A. Bouchard-Côté, D. Hall, T. L. Griffiths, and D. Klein. Automated reconstruction of ancient

languages using probabilistic models of sound change. Proceedings of the National Academy

  • f Sciences of the United States of America, 110(11):4224–4229, 2013.
  • R. Derksen. Etymological dictionary of the Slavic inherited lexicon. Brill, Leiden and Boston,

2008.

  • G. Kroonen. Etymological dictionary of Proto-Germanic. Number 11 in Leiden Indo-European

Etymological Dictionary Series. Brill, Leiden and Boston, 2013. J.-M. List, A. Terhalle, and M. Urban. Using network approaches to enhance the analysis of cross-linguistic polysemies. In Proceedings of the 10th International Conference on Computational Semantics – Short Papers, pages 347–353, Stroudsburg, 2013. Association for Computational Linguistics. J.-M. List, T. Mayer, A. Terhalle, and M. Urban. Clics: Database of Cross-Linguistic

  • Colexifications. Online Resource, 2014a. URL http://clics.lingpy.org.

J.-M. List, S. Nelson-Sathi, H. Geisler, and W. Martin. Networks of lexical borrowing and lateral gene transfer in language and genome evolution. Bioessays, 36(2):141–150, 2014b.

  • M. Meier-Brügger. Indogermanische Sprachwissenschaft. de Gruyter, Berlin and New York, 8

edition, 2002.

  • J. Pokorny. Indogermanisches etymologisches Wörterbuch, volume 1. Francke, Bern, 1959.
  • M. Vaan. Etymological dictionary of Latin and the other Italic languages. Number 7 in Leiden

Indo-European Etymological Dictionary Series. Brill, Leiden and Boston, 2008.

  • M. Vasmer. Ėtimologičeskij slovar’ russkogo jazyka. Progress, Moscow, 1986/1987.
  • D. Wodtko, B. Irslinger, and C. Schneider. Nomina im Indogermanischen Lexikon. Winter,

Heidelberg, 2008.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 42 / 42