Contributions Dataset Models Experiments Conclusion
Weak Semi-Markov CRFs for NP Chunking in Informal Text
Aldrian Obaja Muis and Wei Lu
Singapore University of Technology and Design
Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian - - PowerPoint PPT Presentation
Contributions Dataset Models Experiments Conclusion Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian Obaja Muis and Wei Lu Singapore University of Technology and Design Contributions Dataset Models Experiments Conclusion
Contributions Dataset Models Experiments Conclusion
Aldrian Obaja Muis and Wei Lu
Singapore University of Technology and Design
Contributions Dataset Models Experiments Conclusion
In this paper, we contributed:
1 Noun phrase-annotated SMS corpus1 1Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message
service corpus: the NUS SMS corpus”. In: Language Resources and
2 / 13
Contributions Dataset Models Experiments Conclusion
In this paper, we contributed:
1 Noun phrase-annotated SMS corpus1 2 Weak semi-Markov CRF 1Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message
service corpus: the NUS SMS corpus”. In: Language Resources and
2 / 13
Contributions Dataset Models Experiments Conclusion
3 / 13
Contributions Dataset Models Experiments Conclusion
We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases.
2http://brat.nlplab.org/
4 / 13
Contributions Dataset Models Experiments Conclusion
We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases. Examples:
2http://brat.nlplab.org/
4 / 13
Contributions Dataset Models Experiments Conclusion
We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases. Examples:
2http://brat.nlplab.org/
4 / 13
Contributions Dataset Models Experiments Conclusion
annotators 5 / 13
Contributions Dataset Models Experiments Conclusion
annotators
SMS messages 5 / 13
Contributions Dataset Models Experiments Conclusion
annotators
SMS messages
noun phrases 5 / 13
Contributions Dataset Models Experiments Conclusion
annotators
SMS messages
noun phrases
tokens 5 / 13
Contributions Dataset Models Experiments Conclusion
6 / 13
Contributions Dataset Models Experiments Conclusion
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B I I I O O O said Dr Teh
7 / 13
Contributions Dataset Models Experiments Conclusion
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B I I I O O O said Dr Teh
7 / 13
Contributions Dataset Models Experiments Conclusion
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B I I I O O O said Dr Teh
7 / 13
Contributions Dataset Models Experiments Conclusion
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B I I I O O O said Dr Teh
N N N O O O said Dr Teh
7 / 13
Contributions Dataset Models Experiments Conclusion
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B I I I O O O said Dr Teh
N N N O O O said Dr Teh
7 / 13
Contributions Dataset Models Experiments Conclusion
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B I I I O O O said Dr Teh
N N N O O O said Dr Teh
N N N N N N O O O O O O said Dr Teh
7 / 13
Contributions Dataset Models Experiments Conclusion
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B I I I O O O said Dr Teh
N N N O O O said Dr Teh
N N N N N N O O O O O O said Dr Teh
7 / 13
Contributions Dataset Models Experiments Conclusion
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B I I I O O O said Dr Teh
N N N O O O said Dr Teh
N N N N N N O O O O O O said Dr Teh
7 / 13
Contributions Dataset Models Experiments Conclusion
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B I I I O O O said Dr Teh
N N N O O O said Dr Teh
N N N N N N O O O O O O said Dr Teh
7 / 13
Contributions Dataset Models Experiments Conclusion
8 / 13
Contributions Dataset Models Experiments Conclusion
Basic features +affixes All features 50 60 70 80
71.19 72.49 72.68 74.37 74.69 74.58 74.39 74.60 74.31
F1-Score (%) Linear CRF Semi-CRF Weak Semi-CRF 9 / 13
Contributions Dataset Models Experiments Conclusion
5,000 10,000 15,000 20,000 0.5 1 1.5 2 # training instances (SMS)
Linear-CRF Semi-CRF Weak Semi-CRF 10 / 13
Contributions Dataset Models Experiments Conclusion
11 / 13
Contributions Dataset Models Experiments Conclusion
We have created a new NP-annotated dataset on informal text 12 / 13
Contributions Dataset Models Experiments Conclusion
We have created a new NP-annotated dataset on informal text We can split the decisions of selecting segment length and segment type to improve the training time, while maintaining similar accuracy 12 / 13
Contributions Dataset Models Experiments Conclusion
Code and data available at: http://statnlp.org/research/ie/ Aldrian Obaja Muis and Wei Lu
Singapore University of Technology and Design
13 / 13