An Empirical Comparison of Unsupervised Constituency Parsing Methods - - PowerPoint PPT Presentation

▶

Nov 04, 2023 19 likes •342 views

An Empirical Comparison of Unsupervised Constituency Parsing Methods Jun Li, Yifan Cao, Jiong Cai, Yong Jiang, Kewei Tu {lijun2, caoyf, caijiong, tukw}@shanghaitech.edu.cn yongjiang.jy@alibaba-inc.com Background Goal : To learn a

SLIDE 1

An Empirical Comparison of Unsupervised Constituency Parsing Methods

Jun Li, Yifan Cao, Jiong Cai, Yong Jiang, Kewei Tu

{lijun2, caoyf, caijiong, tukw}@shanghaitech.edu.cn yongjiang.jy@alibaba-inc.com

SLIDE 2

Background

Goal: To learn a constituency parser without parse tree annotations

SLIDE 3

Background

Goal: To learn a constituency parser without parse tree annotations
Trends: This task receives a lot of attention recently (2019)

○ increasing number of accepted papers: NACCL*2，ACL*5, EMNLP*3 ○ with high quality: ICLR 2019 best paper (Shen et al., 2019)

SLIDE 4

Background

Goal: To learn a constituency parser without parse tree annotations
Trends: This task receives a lot of attention recently (2019)

○ increasing number of accepted papers: NACCL*2，ACL*5, EMNLP*3 ○ with high quality: ICLR 2019 best paper (Shen et al., 2019)

Problems: No unified experimental standard has been adopted

○ making the results across papers incomparable

SLIDE 5

Background

Goal: To learn a constituency parser without parse tree annotations
Trends: This task receives a lot of attention recently (2019)

○ increasing number of accepted papers: NACCL*2，ACL*5, EMNLP*3 ○ with high quality: ICLR 2019 best paper (Shen et al., 2019)

Problems: No unified experimental standard has been adopted

○ making the results across papers incomparable

Our contributions:

○ Propose a standardized experimental setup ○ Conduct a systematic experiments on ■ PRPN (Shen et al., 2018) ■ URNNG (Kim et al., 2019b) ■ DIORA (Drozdov et al., 2019) ■ CCM (Klein and Manning, 2002) ■ CCL (Seginer, 2007)

SLIDE 6

Experimental setup

Language

SLIDE 7

Experimental setup

Language

Japanese (mostly left branching) English (mostly right branching) Different languages have different syntactic properties

SLIDE 8

Experimental setup

Language: Use KTB and PTB for training and evaluation
Dataset pre-processing

SLIDE 9

Experimental setup

Language: Use KTB and PTB for training and evaluation
Dataset pre-processing: Train on length <= 10/40; Split into train/dev/test
Punctuation post-processing:

SLIDE 10

Experimental setup

Language: Use KTB and PTB for training and evaluation
Dataset pre-processing: Train on length <= 10/40; Split into train/dev/test
Punctuation post-processing: Attach to root or least common ancestor
Evaluation

（，）（，）（，）…… （，）（，）（，）（，）

SLIDE 11

Experimental setup

Language: Use KTB and PTB for training and evaluation
Dataset pre-processing: Train on length <= 10/40; Split into train/dev/test
Punctuation post-processing: Attach to root or least common ancestor
Evaluation

（，）（，）（，）…… （，）（，）（，）（，）

SLIDE 12

Experimental setup

Language: Use KTB and PTB for training and evaluation
Dataset pre-processing: Train on length <= 10/40; Split into train/dev/test
Punctuation post-processing: Attach to root or least common ancestor
Evaluation: Report Micro/Macro/Evalb F1
…..
More details can be found in our paper

SLIDE 13

Experimental results (English)

SLIDE 14

Experimental results (English)

SLIDE 15

Experimental results (English)

SLIDE 16

Experimental results (English)

SLIDE 17

Experimental results (English)

SLIDE 18

Experimental results (English)

SLIDE 19

Experimental results (English)

SLIDE 20

Experimental results (English)

…

SLIDE 21

Experimental results (English)

SLIDE 22

Experimental results (English)

SLIDE 23

Experimental results (English)

SLIDE 24

Experimental results (English)

SLIDE 25

Experimental results (English)

SLIDE 26

Experimental results (English)

SLIDE 27

Experimental results (Japanese)

SLIDE 28

Experimental results (Japanese)

SLIDE 29

Experimental results (Japanese)

SLIDE 30

Experimental results (Japanese)

SLIDE 31

Conclusion

We propose a standardized experimental setup for unsupervised constituency

parsing

We empirically compare five methods and find that recent models do not

show a clear advantage over decade-old models

SLIDE 32

An Empirical Comparison of Unsupervised Constituency Parsing Methods

Jun Li, Yifan Cao, Jiong Cai, Yong Jiang, Kewei Tu

{lijun2, caoyf, caijiong, tukw}@shanghaitech.edu.cn yongjiang.jy@alibaba-inc.com

Background

Background

Background

Background

Experimental setup

Experimental setup

Experimental setup

Experimental setup

Experimental setup

Experimental setup

Experimental setup

Experimental results (English)

Experimental results (English)

Experimental results (English)

Experimental results (English)

Experimental results (English)

Experimental results (English)

Experimental results (English)

Experimental results (English)

…

Experimental results (English)

Experimental results (English)

Experimental results (English)

Experimental results (English)

Experimental results (English)

Experimental results (English)

Experimental results (Japanese)

Experimental results (Japanese)

Experimental results (Japanese)

Experimental results (Japanese)

Conclusion

parsing

show a clear advantage over decade-old models

Thank you!