An Empirical Comparison of Unsupervised Constituency Parsing Methods - - PowerPoint PPT Presentation

an empirical comparison of unsupervised constituency
SMART_READER_LITE
LIVE PREVIEW

An Empirical Comparison of Unsupervised Constituency Parsing Methods - - PowerPoint PPT Presentation

An Empirical Comparison of Unsupervised Constituency Parsing Methods Jun Li, Yifan Cao, Jiong Cai, Yong Jiang, Kewei Tu {lijun2, caoyf, caijiong, tukw}@shanghaitech.edu.cn yongjiang.jy@alibaba-inc.com Background Goal : To learn a


slide-1
SLIDE 1

An Empirical Comparison of Unsupervised Constituency Parsing Methods

Jun Li, Yifan Cao, Jiong Cai, Yong Jiang, Kewei Tu

{lijun2, caoyf, caijiong, tukw}@shanghaitech.edu.cn yongjiang.jy@alibaba-inc.com

slide-2
SLIDE 2

Background

  • Goal: To learn a constituency parser without parse tree annotations
slide-3
SLIDE 3

Background

  • Goal: To learn a constituency parser without parse tree annotations
  • Trends: This task receives a lot of attention recently (2019)

○ increasing number of accepted papers: NACCL*2,ACL*5, EMNLP*3 ○ with high quality: ICLR 2019 best paper (Shen et al., 2019)

slide-4
SLIDE 4

Background

  • Goal: To learn a constituency parser without parse tree annotations
  • Trends: This task receives a lot of attention recently (2019)

○ increasing number of accepted papers: NACCL*2,ACL*5, EMNLP*3 ○ with high quality: ICLR 2019 best paper (Shen et al., 2019)

  • Problems: No unified experimental standard has been adopted

○ making the results across papers incomparable

slide-5
SLIDE 5

Background

  • Goal: To learn a constituency parser without parse tree annotations
  • Trends: This task receives a lot of attention recently (2019)

○ increasing number of accepted papers: NACCL*2,ACL*5, EMNLP*3 ○ with high quality: ICLR 2019 best paper (Shen et al., 2019)

  • Problems: No unified experimental standard has been adopted

○ making the results across papers incomparable

  • Our contributions:

○ Propose a standardized experimental setup ○ Conduct a systematic experiments on ■ PRPN (Shen et al., 2018) ■ URNNG (Kim et al., 2019b) ■ DIORA (Drozdov et al., 2019) ■ CCM (Klein and Manning, 2002) ■ CCL (Seginer, 2007)

slide-6
SLIDE 6

Experimental setup

  • Language
slide-7
SLIDE 7

Experimental setup

  • Language

Japanese (mostly left branching) English (mostly right branching) Different languages have different syntactic properties

slide-8
SLIDE 8

Experimental setup

  • Language: Use KTB and PTB for training and evaluation
  • Dataset pre-processing
slide-9
SLIDE 9

Experimental setup

  • Language: Use KTB and PTB for training and evaluation
  • Dataset pre-processing: Train on length <= 10/40; Split into train/dev/test
  • Punctuation post-processing:
slide-10
SLIDE 10

Experimental setup

  • Language: Use KTB and PTB for training and evaluation
  • Dataset pre-processing: Train on length <= 10/40; Split into train/dev/test
  • Punctuation post-processing: Attach to root or least common ancestor
  • Evaluation

( , )( , )( , )…… ( , )( , )( , ) ( , )

slide-11
SLIDE 11

Experimental setup

  • Language: Use KTB and PTB for training and evaluation
  • Dataset pre-processing: Train on length <= 10/40; Split into train/dev/test
  • Punctuation post-processing: Attach to root or least common ancestor
  • Evaluation

( , )( , )( , )…… ( , )( , )( , ) ( , )

slide-12
SLIDE 12

Experimental setup

  • Language: Use KTB and PTB for training and evaluation
  • Dataset pre-processing: Train on length <= 10/40; Split into train/dev/test
  • Punctuation post-processing: Attach to root or least common ancestor
  • Evaluation: Report Micro/Macro/Evalb F1
  • …..
  • More details can be found in our paper
slide-13
SLIDE 13

Experimental results (English)

slide-14
SLIDE 14

Experimental results (English)

slide-15
SLIDE 15

Experimental results (English)

slide-16
SLIDE 16

Experimental results (English)

slide-17
SLIDE 17

Experimental results (English)

slide-18
SLIDE 18

Experimental results (English)

slide-19
SLIDE 19

Experimental results (English)

slide-20
SLIDE 20

Experimental results (English)

slide-21
SLIDE 21

Experimental results (English)

slide-22
SLIDE 22

Experimental results (English)

slide-23
SLIDE 23

Experimental results (English)

slide-24
SLIDE 24

Experimental results (English)

slide-25
SLIDE 25

Experimental results (English)

slide-26
SLIDE 26

Experimental results (English)

slide-27
SLIDE 27

Experimental results (Japanese)

slide-28
SLIDE 28

Experimental results (Japanese)

slide-29
SLIDE 29

Experimental results (Japanese)

slide-30
SLIDE 30

Experimental results (Japanese)

slide-31
SLIDE 31

Conclusion

  • We propose a standardized experimental setup for unsupervised constituency

parsing

  • We empirically compare five methods and find that recent models do not

show a clear advantage over decade-old models

slide-32
SLIDE 32

Thank you!