Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian - - PowerPoint PPT Presentation

▶

Dec 06, 2022 371 likes •675 views

Contributions Dataset Models Experiments Conclusion Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian Obaja Muis and Wei Lu Singapore University of Technology and Design Contributions Dataset Models Experiments Conclusion

SLIDE 1

Contributions Dataset Models Experiments Conclusion

Weak Semi-Markov CRFs for NP Chunking in Informal Text

Aldrian Obaja Muis and Wei Lu

Singapore University of Technology and Design

SLIDE 2

Contributions Dataset Models Experiments Conclusion

Paper Contributions

In this paper, we contributed:

1 Noun phrase-annotated SMS corpus1 1Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message

service corpus: the NUS SMS corpus”. In: Language Resources and

Evaluation. Vol. 47. Springer Netherlands, pp. 299–335.

2 / 13

SLIDE 3

Contributions Dataset Models Experiments Conclusion

Paper Contributions

In this paper, we contributed:

1 Noun phrase-annotated SMS corpus1 2 Weak semi-Markov CRF 1Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message

service corpus: the NUS SMS corpus”. In: Language Resources and

Evaluation. Vol. 47. Springer Netherlands, pp. 299–335.

2 / 13

SLIDE 4

Contributions Dataset Models Experiments Conclusion

NP-annotated SMS Corpus

3 / 13

SLIDE 5

Contributions Dataset Models Experiments Conclusion

NP-annotated SMS Corpus

We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases.

2http://brat.nlplab.org/

4 / 13

SLIDE 6

Contributions Dataset Models Experiments Conclusion

NP-annotated SMS Corpus

We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases. Examples:

2http://brat.nlplab.org/

4 / 13

SLIDE 7

Contributions Dataset Models Experiments Conclusion

NP-annotated SMS Corpus

We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases. Examples:

2http://brat.nlplab.org/

4 / 13

SLIDE 8

Contributions Dataset Models Experiments Conclusion

Annotations Statistics

64

annotators 5 / 13

SLIDE 9

Contributions Dataset Models Experiments Conclusion

Annotations Statistics

64

annotators

26,500

SMS messages 5 / 13

SLIDE 10

Contributions Dataset Models Experiments Conclusion

Annotations Statistics

64

annotators

26,500

SMS messages

76,490

noun phrases 5 / 13

SLIDE 11

Contributions Dataset Models Experiments Conclusion

Annotations Statistics

64

annotators

26,500

SMS messages

76,490

noun phrases

359,009

tokens 5 / 13

SLIDE 12

Contributions Dataset Models Experiments Conclusion

Models

6 / 13

SLIDE 13

Contributions Dataset Models Experiments Conclusion

Models Comparison

n : # words in the sentence, |Y| : # labels, L : max segment length

B B B I I I O O O said Dr Teh

Fig. 1: Linear CRF: O(n|Y|2)

7 / 13

SLIDE 14

Contributions Dataset Models Experiments Conclusion

Models Comparison

n : # words in the sentence, |Y| : # labels, L : max segment length

B B B I I I O O O said Dr Teh

Fig. 1: Linear CRF: O(n|Y|2)

7 / 13

SLIDE 15

Contributions Dataset Models Experiments Conclusion

Models Comparison

n : # words in the sentence, |Y| : # labels, L : max segment length

B B B I I I O O O said Dr Teh

Fig. 1: Linear CRF: O(n|Y|2)

7 / 13

SLIDE 16

Contributions Dataset Models Experiments Conclusion

Models Comparison

n : # words in the sentence, |Y| : # labels, L : max segment length

B B B I I I O O O said Dr Teh

Fig. 1: Linear CRF: O(n|Y|2)

N N N O O O said Dr Teh

Fig. 2: Semi-CRF: O(nL |Y|2)

7 / 13

SLIDE 17

Contributions Dataset Models Experiments Conclusion

Models Comparison

n : # words in the sentence, |Y| : # labels, L : max segment length

B B B I I I O O O said Dr Teh

Fig. 1: Linear CRF: O(n|Y|2)

N N N O O O said Dr Teh

Fig. 2: Semi-CRF: O(nL |Y|2)

7 / 13

SLIDE 18

Contributions Dataset Models Experiments Conclusion

Models Comparison

n : # words in the sentence, |Y| : # labels, L : max segment length

B B B I I I O O O said Dr Teh

Fig. 1: Linear CRF: O(n|Y|2)

N N N O O O said Dr Teh

Fig. 2: Semi-CRF: O(nL |Y|2)

N N N N N N O O O O O O said Dr Teh

Fig. 3: Weak Semi-CRF: O(n |Y|2 + nL |Y|)

7 / 13

SLIDE 19

Contributions Dataset Models Experiments Conclusion

Models Comparison

n : # words in the sentence, |Y| : # labels, L : max segment length

B B B I I I O O O said Dr Teh

Fig. 1: Linear CRF: O(n|Y|2)

N N N O O O said Dr Teh

Fig. 2: Semi-CRF: O(nL |Y|2)

N N N N N N O O O O O O said Dr Teh

Fig. 3: Weak Semi-CRF: O(n |Y|2 + nL |Y|)

7 / 13

SLIDE 20

Contributions Dataset Models Experiments Conclusion

Models Comparison

n : # words in the sentence, |Y| : # labels, L : max segment length

B B B I I I O O O said Dr Teh

Fig. 1: Linear CRF: O(n|Y|2)

N N N O O O said Dr Teh

Fig. 2: Semi-CRF: O(nL |Y|2)

N N N N N N O O O O O O said Dr Teh

Fig. 3: Weak Semi-CRF: O(n |Y|2 + nL |Y|)

7 / 13

SLIDE 21

Contributions Dataset Models Experiments Conclusion

Models Comparison

n : # words in the sentence, |Y| : # labels, L : max segment length

B B B I I I O O O said Dr Teh

Fig. 1: Linear CRF: O(n|Y|2)

N N N O O O said Dr Teh

Fig. 2: Semi-CRF: O(nL |Y|2)

N N N N N N O O O O O O said Dr Teh

Fig. 3: Weak Semi-CRF: O(n |Y|2 + nL |Y|)

7 / 13

SLIDE 22

Contributions Dataset Models Experiments Conclusion

Empirical Verification

8 / 13

SLIDE 23

Contributions Dataset Models Experiments Conclusion

F1-Score

Basic features +affixes All features 50 60 70 80

71.19 72.49 72.68 74.37 74.69 74.58 74.39 74.60 74.31

F1-Score (%) Linear CRF Semi-CRF Weak Semi-CRF 9 / 13

SLIDE 24

Contributions Dataset Models Experiments Conclusion

Training Speed

5,000 10,000 15,000 20,000 0.5 1 1.5 2 # training instances (SMS)

Avg. time per iteration (s)

Linear-CRF Semi-CRF Weak Semi-CRF 10 / 13

SLIDE 25

Contributions Dataset Models Experiments Conclusion

Conclusion

11 / 13

SLIDE 26

Contributions Dataset Models Experiments Conclusion

Conclusion

We have created a new NP-annotated dataset on informal text 12 / 13

SLIDE 27

Contributions Dataset Models Experiments Conclusion

Conclusion

We have created a new NP-annotated dataset on informal text We can split the decisions of selecting segment length and segment type to improve the training time, while maintaining similar accuracy 12 / 13

SLIDE 28

Contributions Dataset Models Experiments Conclusion

Thank You

Code and data available at: http://statnlp.org/research/ie/ Aldrian Obaja Muis and Wei Lu

Singapore University of Technology and Design

13 / 13