Development of an ESP E-Learning Tool Using In-House Corpora Yukie - - PDF document

▶

Aug 02, 2023 337 likes •430 views

Development of an ESP E-Learning Tool Using In-House Corpora Yukie KOYAMA, Tomofumi NAKANO and Chikako MATSUURA Nagoya Institute of Technology Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan { koyama, nakano and chikako } @center.nitech.ac.jp

SLIDE 1

Development of an ESP E-Learning Tool Using In-House Corpora

Yukie KOYAMA, Tomofumi NAKANO and Chikako MATSUURA

Nagoya Institute of Technology Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan {koyama, nakano and chikako}@center.nitech.ac.jp

Abstract. This study introduces a methodology for developing an e-

learning tool by using corpora and computer software for linguistic anal-

ysis. The corpora compiled for this study are of journal articles from two

engineering fields, and of articles from general science magazines. Each corpus, consisting of approximately 500000 words, is tagged and parsed. Analysis of these corpora reveals that the past participle form has high frequency among verb forms. Therefore, sentences including this form are extracted, and items asking the main verb were made with them. Students answered for 37,333 sentences in total. The analysis of answers shows that students tend to answer incorrectly for items in longer sen- tences, and items whose main verb is located toward the end. Individual analysis of each student was also conducted. This kind of analysis can help language teachers to provide targeted practice for students, using authentic, discipline-specific textual data.

1 Introduction

In the field of English education, English for Specific Purposes (ESP) has been recognized as an important aspect to realize effective learning among learners with language needs relating to a particular discipline[1][2], and a needs analysis is crucial in order to decide on ESP teaching content and methodology [3]. In a previous study, various kinds of needs analyses were conducted to confirm that field specific material is necessary for English for Science and Technology (EST) learners [4]. As the base of genre analysis, text analysis of genre-specific texts has to be targeted. Fortunately, with the development of information technology and high spec PCs, compiling and analyzing a good-sized corpus has become possible for the individual researcher. In this context, utilizing a corpus-based approach for ESP is highly recommended among researchers in English education [5][6][7]. More-

ver, since all the text data in a corpus is stored in digital form, it is easy to

utilize it for e-learning. Developing an e-learning tool with items extracted from a corpus has many advantages, including the use of authentic (if somewhat de- contextualized) materials that can be chosen from specific disciplines and genres. In addition, it is much easier to get sufficient feedback from learners if tasks are

SLIDE 2

conducted on a web-site. In this context, this study describes the methodol-

gy of the development of a corpus-based e-learning tool and its rationale for

EST learners. In this case, the targeted learners are undergraduate students of engineering in Japan.

2 Method

2.1 Compiling corpora The first digital corpus was the Brown Corpus of American English, whose size is approximately one million words [8]. However, at present, the size of the two representative corpora in the world is far larger than those in the early era; The Bank of English consists of 450 million words (in 2002 January) and the British National Corpus of approximately 100 million words. Compared to these general corpora, the corpora compiled for this study are much smaller. However, for a specific purpose, a small corpus can also give us sufficient linguistic data [6]. The data was taken from CD-ROMs, on-line journals and internet homepages. As the first stage of this study, two corpora were compiled: one is a corpus of research articles in academic journals (J-corpus) separated into the two subcorpora of mechanical and electrical engineering fields. Articles were selected from 11 journals in these fields. The second corpus is made up of articles in a general science and technology magazine (M-Corpus). The source for this was articles of Scientific American (1997 to 2001). Information

n the corpora is given in Table 1.

Table 1. Size and Kinds of Corpora Source of corpus Engineering Journals General Scientific (J-Corpus) (M-Corpus) Electrical Mechanical Size of corpus Engineering Engineering number of sentences 25,295 23,798 25,735 number of words 560,014 567,206 597,208

2.2 Linguistic analysis of the corpora by using software In order to analyze and find linguistic characteristics, a tagger and a parser were

used. A tagger marks each word in terms of part of speech, such as adjective, verb

and noun. Brill’s tagging software is used in this study [9]. A parser conducts syntactic analysis of a sentence, which makes it possible to define a phrase in a sentence as the subject, the object or the main verb. The parser used here is Apple Pie Parser, which is based on Penn Tree Bank [10].

SLIDE 3

The following grammatical characteristics, in terms of part of speech, were revealed by tagging: past participle patterns appear most frequently in the J- corpus, in both Electrical and Mechanical Engineering articles. However, in the M-corpus the frequency of the infinitive form is the highest of all. This finding clearly backs up the importance of the empirical judgment of both engineering and English teachers that students often have difficulties distinguishing between the past participle and simple past when they read journal articles, especially in the case of the past participle form being used as the postmodifier of a noun.

Table 2. Frequency List of Part-of-Speech (percentage of the whole corpus) Verb form Infinitive Simple Present Past Present Present past progressive participle (1st/2nd person (3rd person Corpus singular) singular) Electrical 2.39 1.27 2.66 4.37 1.63 3.11 Engineering Mechanical 2.15 0.99 2.69 4.25 1.58 3.23 Engineering General Science 3.79 1.85 2.82 2.64 2.26 2.50

2.3 Designing an appropriate e-learning tool From the linguistic characteristics found in the above section, it was decided that the web-learning tool introduced in this study should be made with sentences that include a past participle form as postmodifier and whose main verb is in the simple past. The task is to choose the main verb of the sentence. The method

f item making is as follows.
1. Digitally separating one sentence from another in order to make the next

process possible.

2. Analyzing the J-corpus and M-corpus with two kinds of software: syntac-

tic analysis with Apple Pie Parser and part -of-speech tagging with Brill’s Tagger.

3. Based on the above analyses, extracting sentences with past participle forms.
4. In order to make items comparatively difficult, extracting sentences in past

tense from those identified in stage 3. Excluding sentences with passive forms and past perfect forms so that items should be of the appropriate difficulty level for students.

5. Under the above conditions, sentences for item making were extracted and

chosen randomly by programming. Each student was asked to give answers to at least one hundred items in total. Participants were first and second year undergraduate students of several engineering departments and were given extra points in an English course if they completed the task.

SLIDE 4

The following are some examples of the items with answers in <<>>, while the actual CGI page is shown in Figure 1. Choose the main verb of the sentence. If there are two of them, choose the first

ne.
1. A team of astronomers led by John K. Webb of the University of New South

Wales has found the first hint that the laws of physics were slightly different billions of years ago. << has >>

2. Both the introduced analytical and numerical approaches give program users

information about the approximation involved in the integration method. << give >>

3. The gas density is negligible compared to liquid density. << is >>
4. Researchers have exploited this equipment in large-scale field studies, de-

signed to gauge just where and how people are exposed to potentially dan- gerous chemicals. << have >>

Fig. 1. Task Page

SLIDE 5

3 Results

3.1 General tendencies in item difficulty The cumulative number of items solved by students was 37,333, the number

f students was 218, and the number of different items was 5,868. Figure 2

gives a graphic representation of the analyses in terms of error rate (shown on the vertical axis), length of the sentence (in words, shown on the foreground horizontal axis), and location of the main verb in the sentence (0: beginning, 1: end). The figure shows a peak around the word length of 25. Another thing shown in the graph is that the error rate is lower when the location of the main verb is near the beginning and gradually rises toward the end of the sentence. This result proves that the longer the item sentence is and the nearer toward the end the main verb is located, the more difficult an item becomes for the students who participated in the study.

5 10 15 20 25 30 35 40 45 0 0.2 0.4 0.6 0.8 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Fig. 2. Error Rate, Sentence Length in Words, and Location of Main Verb in Sentence.

3.2 Analysis of answers by an individual student using C4.5 In the previous section the general tendencies of the entire data were analyzed. In this section an individual student is randomly taken as an example to be analyzed by a machine learning tool called C4.5 [11]. Sample results are shown:

SLIDE 6

Rule 19: words > 24 and <= 0 that > 0 --> class incorrect [75.8%] Rule 12: by > 0 at > 0

class incorrect [70.7%] Rule 15: words > 25 which > 0

-> class incorrect

[70.7%] Rule 14: words <= 25 that <=0

class correct [79.5%] Rule 19 shows that the possibility of an incorrect answer, in the case of this learner, is 75.8% when a sentence is more than 24 words, does not include the word ”and” and does include ”that” in it. On the other hand, Rule 14 shows if the number of the words in a sentence is less than 25 and no ”that” is used in it, the possibility of a correct answer is as high as 79.5%. Thus it was clarified by this analysis, as well as by the one in the previous section, that the difficulty level rises with the complexity and as the length of the sentence becomes higher. Another merit of this analysis is that it provides diagnostic information for an individual learner as feedback, which can act as motivation to the learner to study English.

4 Conclusion and further implications

This study briefly introduces the methodology for the development of a web- learning tool based on a corpora for science and technology. In addition to this, student answers to the items generated in the system were analyzed in two ways. The first is in terms of error rate, in relation to sentence length and location of the main verb, while the second is that of an individual learner’s answers. Through analysis the following points have been clarified.

1. Among all tenses, the past participle form is used the most frequently in

journal papers of mechanical and electrical engineering. Therefore under- standing the past participle form and its use as a postmodifier is important to read these papers accurately.

2. Items including past participle form were developed on a web-page and used

by approximately 200 students. The entire data consists of more than 37,000

answers. The result of the data analysis indicates the error rates rise as a

sentence becomes longer and as the location of main verb moves toward the end of the sentence.

3. Using individual data analysis it is clarified that in case of long sentences of

more than 25 words, including conjunctions or relative pronouns, the possi- bility of incorrect answers can rise higher than 70%. This kind of information can also be meaningful feedback for an individual learner. Although these are new findings in this on-going study, there are many other tasks for us to complete in order to develop a more appropriate web-learning

tool. First of all, it is necessary to add text data of other engineering fields such

as civil engineering, and from other general science and technology magazines in

rder to make the corpora more thorough in terms of both quality and quantity.

Another important element of this study is how to settle conditions to select item sentences because items were automatically generated under these condi-

tions. In order to develop more appropriate learning tool for learners, it is also

SLIDE 7

necessary to find structural characteristics of EST texts. Therefore more de- tailed structural analysis has to be done. Furthermore, in terms of the variety of learning tools, a vocabulary-learning tool and vocabulary tests can be targeted

n the basis of vocabulary analysis, such as word frequency and collocation. In

addition to these points, a survey on students as users of the web-learning tool should be conducted in the next study, so that we will be able to understand their opinions, reactions, and utilization of the tool. In the future we hope to use the web-learning tool to measure the effectiveness of direct instruction in target areas such as past participle usage in scientific writing. In this computerized era, it is our urgent task to develop an effective and meaningful web-learning tool for the enhancement of students’ English ability. This study has provided practical information towards that goal.

References

1. Hutchinson, T. and Waters, A.: English for Specific Purposes: A learning-centered

approach, Cambridge University Press, Cambridge (1987)

2. Swales, J.: Episodes in ESP, Prentice Hall, NJ (1988)
3. Miyama, A., Kanzaki, Y., Noguchi, J., Sasajima, S., Terauchi, H.: Theory and

Practice of ESP, Sanshusha, Tokyo (2000)

4. Koyama, Y.: An analysis of English Education in Engineering Universities and

Establishment of an Effective Curriculum, Final Report, National Foundation of Sciences Grant Project 10610465. With Y. Koyama, N. Hayakawa, T. Yoshikawa and Y. Shimizu, Niigata (2001)

5. Biber, D., Conrad S., Reppen R.: Corpus Linguistics, Cambridge University Press,

Cambridge (1998)

6. Hunston, H.: Corpora in Applied Linguistics, Cambridge University Press, Cam-

bridge (2002)

7. Kennedy G.: An Introduction to Corpus Linguistics, Addison Wesley Lvongman

Limited, Essex (1998)

8. Francis, N. and Kucera, H.: Manual of Information to Accompany ’A Standard

Sample of Present-Day Edited American English, for Use with Digital Computers’ (revised 1979), Department of Linguistics, Providence, Brown University (1964)

9. Brill, E.: Brill’s Tagger, http://www.cs.jhu.edu/ brill/ (1992)
10. Sekine, S.: Apple Pie Parser version 5.9. www.cs.nyu.edu/cs/projects/proteus/app/

(2000)

11. Quinlan, R.: C4.5:Programs for Machine Learning, Morgan Kaufman, San Fran-