Aylin nald FOAI 11 Thank you FOAI for bringing together the - - PowerPoint PPT Presentation

aylin nald foai 11 thank you foai for bringing together
SMART_READER_LITE
LIVE PREVIEW

Aylin nald FOAI 11 Thank you FOAI for bringing together the - - PowerPoint PPT Presentation

Automatic Text Analysis Tools Facilitating Text Selection in Reading Assessment Aylin nald FOAI 11 Thank you FOAI for bringing together the assessment professionals for more than 10 years Mehtap nce , Esin alayan , Berna


slide-1
SLIDE 1

Automatic Text Analysis Tools Facilitating Text Selection in Reading Assessment

Aylin Ünaldı FOAI 11

slide-2
SLIDE 2
  • Thank you FOAI for bringing together the

assessment professionals for more than 10 years

  • Mehtap İnce, Esin Çağlayan, Berna Akpınar Aslan
  • Medipol University
  • Language Assessment Issues in Turkey

A sister email/discussion list: lait@list.boun.edu.tr

2

slide-3
SLIDE 3

Common Issues

  • Contextual validity in reading assessment
  • Concept of textual complexity
  • How automatic tools are used in determining

textual complexity

  • How the information from automatic tools can

guide us

3

slide-4
SLIDE 4

TEST SPECIFICATIONS

– cognitive operations that tasks are expected to trigger

  • e.g. reading across sentences to elicit implied meaning

– task characteristics

  • e.g. short-answer

– response method

  • written response: two-three words at most

– input characteristics/textual features

4

slide-5
SLIDE 5

Context Validity (Weir, 2005)

5

What are the contextual features we need to take into consideration when we judge the suitability of texts for assessment purposes? How do we set the levels of complexity in texts targeted at different proficiency levels?

slide-6
SLIDE 6

Complexity Taxonomy (Bulte & Housen, 2012)

6 «… complexity refers to a property or quality of a phenomenon or entity in terms of (1) the number and the nature of the discrete components that the entity consists of, and (2) the number and the nature of the relationships between the constituent components.»

slide-7
SLIDE 7

‘Something wicked comes this way’

7

slide-8
SLIDE 8

Inventory of linguistic complexity measures (Bulte & Housen, 2012)

  • A. GRAMMATICAL COMPLEXITY
  • a. Syntactic
  • i. Overall
  • Mean length of T-unit
  • Mean length of c-unit
  • Mean length of turn
  • Mean length of AS-unit
  • Mean length of utterance
  • S-nodes / T-unit
  • S-nodes / AS-unit
  • ii. Sentential – Coordination
  • Coordinated clauses / clauses
  • iii. Sentential – Subordination

– Clauses / AS-unit – Clauses / c-unit – Clauses/T-unit – Dependent clauses/clause – Number of Subordinate clauses – Subordinate clauses/clauses – Subordinate clauses / dependent clauses – Subordinate clauses/T-unit – Relative clauses/T-unit – Verb phrases/T-unit

  • iv. Subsentential (Clausal +

Phrasal)

  • 18. Mean length of clause
  • 19. S-nodes/clause
  • v. Clausal
  • Syntactic arguments / clause
  • vi. Phrasal
  • Dependents / (noun, verb)

phrase

  • vii. Other (± syntactic

sophistication)

  • 20. Frequency of passive forms
  • 21. Frequency of infinitival

phrases

  • 22.Frequency of conjoined

clauses

  • 23. Frequency of Wh-clauses
  • 24. Frequency of imperatives
  • 25. Frequency of auxiliaries
  • 26. Frequency of comparatives
  • 27. Frequency of conditionals
  • b. Morphological
  • i. Infectional
  • Frequency of tensed forms
  • Frequency of modals
  • Number of di erent verb forms
  • Variety of past tense forms
  • ii. Derivational
  • Measure of affixation
  • B. LEXICAL COMPLEXITY
  • a. Diversity
  • Number of word types
  • TTR
  • Mean segmental TTR
  • Guiraud Index
  • (Word types)2 / words
  • D
  • b. Density
  • Lexical words / Function words
  • Lexical words / Total words
  • c. Sophistication
  • 40. Less frequent words / Total

words

8

slide-9
SLIDE 9

Syntactic complexity measures (Lu, 2010)

  • Length of production

1. Mean length of clause (MLC) 2. Mean length of sentence (MLS) 3. Mean length of T-unit (MLT)

  • Sentence complexity

4. Mean number of clauses per sentence (C/S)

  • Subordination

5. Mean number of clauses per T-unit (C/T) 6. Mean number of complex T-units per T-unit (CT/T) 7. Mean number of dependent clauses per clause (DC/C) 8. Mean number of dependent clauses per T-unit (DC/T)

  • Coordination

9. Mean number of coordinate phrases per clause (CP/C)

  • 10. Mean number of coordinate phrases per T-unit (CP/T)
  • 11. Mean number of T-units per sentence (T/S)
  • Particular grammatical structures
  • 12. Mean number of complex nominals per clause (CN/C)
  • 13. Mean number of complex nominals per T-unit (CN/T)
  • 14. Mean number of verb phrases per T-unit (VP/T)

9

slide-10
SLIDE 10

Cohmetrix indices of the two texts: McNamara, Graesser, McCarthy and Cai (2004) http://tool.cohmetrix.com/

  • Number

Label Label V2.x Text Text2 Full description

  • 1

DESPC READNP 1 1 Paragraph count, number of paragraphs

  • 2

DESSC READNS 1 11 8 Sentence count, number of sentences

  • 3

DESWC READNW 191 206 Word count, number of words

  • 4

DESPL READAPL 11 8 Paragraph length, number of sentences in a paragraph, mean

  • 5

DESPLd Paragraph length, number of sentences in a paragraph, standard deviation

  • 6

DESSL READASL 17.364 25.75 Sentence length, number of words, mean

  • 7

DESSLd n/a 9.470 8.246 Sentence length, number of words, standard deviation

  • 8

DESWLsyREADASW 1.571 1.718 Word length, number of syllables, mean

  • 9

DESWLsyd 0.810 1.040 Word length, number of syllables, standard deviation

  • 10

DESWLlt 4.780 5.505 Word length, number of letters, mean

  • 11

DESWLltd 2.560 2.921 Word length, number of letters, standard deviation

  • 12

PCNARz n/a

  • 1.154
  • 1.042

Text Easability PC Narrativity, z score

  • 13

PCNARp n/a 12.510 14.920 Text Easability PC Narrativity, percentile

  • 14

PCSYNz n/a 0.779

  • 0.334

Text Easability PC Syntactic simplicity, z score

  • 15

PCSYNp n/a 77.940 37.070 Text Easability PC Syntactic simplicity, percentile

  • 16

PCCNCz n/a

  • 0.649

1.697 Text Easability PC Word concreteness, z score

  • 17

PCCNCp n/a 26.110 95.450 Text Easability PC Word concreteness, percentile

  • 18

PCREFz n/a

  • 1.872
  • 0.416

Text Easability PC Referential cohesion, z score

  • 19

PCREFp n/a 3.070 34.090 Text Easability PC Referential cohesion, percentile

  • 20

PCDCz n/a

  • 0.015
  • 0.036

Text Easability PC Deep cohesion, z score

  • 21

PCDCp n/a 49.600 48.800 Text Easability PC Deep cohesion, percentile

  • 22

PCVERBz n/a

  • 0.279
  • 2.506

Text Easability PC Verb cohesion, z score

  • 23

PCVERBp 38.970 0.620 Text Easability PC Verb cohesion, percentile

  • 24

PCCONNz

  • 3.123
  • 4.501

Text Easability PC Connectivity, z score

  • 25

PCCONNp 0.090 Text Easability PC Connectivity, percentile

  • 26

PCTEMPz

  • 2.841

0.952 Text Easability PC Temporality, z score

  • 27

PCTEMPp 0.230 82.890 Text Easability PC Temporality, percentile

10 descriptives Text Easibility

slide-11
SLIDE 11

Cohmetrix indices of the two texts

  • 28

CRFNO1 CRFBN1um 0.100 0.429 Noun overlap, adjacent sentences, binary, mean

  • 29

CRFAO1 CRFBA1um0.100 0.100 0.571 Argument overlap, adjacent sentences, binary, mean

  • 30

CRFSO1 CRFBS1um 0.5 0.714 Stem overlap, adjacent sentences, binary, mean

  • 31

CRFNOa CRFBNaum 0.164 0.321 Noun overlap, all sentences, binary, mean

  • 32

CRFAOa CRFBAaum 0.255 0.464 Argument overlap, all sentences, binary, mean

  • 33

CRFSOa CRFBSaum 0.309 0.070 Stem overlap, all sentences, binary, mean

  • 34

CRFCWO1 CRFPC1um 0.050 0.079 Content word overlap, adjacent sentences, proportional, mean

  • 35

CRFCWO1d 0.073 0.079 Content word overlap, adjacent sentences, proportional, standard deviation

  • 36

CRFCWOa CRFPCaum 0.040 0.056 Content word overlap, all sentences, proportional, mean

  • 37

CRFCWOad 0.072 0.066 Content word overlap, all sentences, proportional, standard deviation

  • 38

LSASS1 LSAassa 0.140 0.428 LSA overlap, adjacent sentences, mean

  • 39

LSASS1d LSAassd 0.156 0.134 LSA overlap, adjacent sentences, standard deviation

  • 40

LSASSp LSApssa 0.115 0.406 LSA overlap, all sentences in paragraph, mean

  • 41

LSASSpd LSApssd 0.107 0.126 LSA overlap, all sentences in paragraph, standard deviation

  • 42

LSAPP1 LSAppa LSA overlap, adjacent paragraphs, mean

  • 43

LSAPP1d LSAppd LSA overlap, adjacent paragraphs, standard deviation

  • 44

LSAGN LSAGN 0.229 0.363 LSA given/new, sentences, mean

  • 45

LSAGNd 0.125 0.167 LSA given/new, sentences, standard deviation

  • 46

LDTTRc TYPTOKc 0.841 0.823 Lexical diversity, type-token ratio, content word lemma

  • 47

LDTTRa n/a 0.635 0.644 Lexical diversity, type-token ratio, all word

  • 48

LDMTLD LEXDIVTD 103.284 117.494 Lexical diversity, MTLD, all word

  • 49

LDVOCD LEXDIVVD 100.030 101.993 Lexical diversity, VOCD, all word

  • 50

CNCAll CONi 94.241 111.650 All connectives incident

  • 51

CNCCaus CONCAUSi 20.942 14.563 Causal connectives incident

  • 52

CNCLogic CONLOGi 36.649 33.981 Logical connectives incident

  • 53

CNCADC CONADVCONi 10.471 14.563 Adversative and contrastive connectives incident

  • 54

CNCTemp CONTEMPi 20.942 29.126 Temporal connectives incident

  • 55

CNCTempx CONTEMPEXi 10.471 9.709 Expanded temporal connectives incident

  • 56

CNCAdd CONADDi 62.827 77.670 Additive connectives incident

  • 57

CNCPos n/a Positive connectives incident

  • 58

CNCNeg n/a Negative connectives incident

  • 59

SMCAUSv CAUSV 31.414 24.272 Causal verb incident

  • 60

SMCAUSvp CAUSVP 36.649 324.272 Causal verbs and causal particles incident

  • 61

SMINTEp INTEi 20.942 19.417 Intentional verbs incident

  • 62

SMCAUSr CAUSC 0.143 Ratio of casual particles to causal verb

  • 63

SMINTEr INTEC 0.800 0.600 Ratio of intentional particles to intentional verb

  • 64

SMCAUSlsa CAUSLSA 0.124 0.044 LSA verb overlap

  • 65

SMCAUSwn CAUSWN 0.410 0.466 WordNet verb overlap

  • 66

SMTEMP TEMPta 0.550 0.929 Temporal cohesion, tense and aspect repetition, mean

11 word overlap across text Latent Semantic Analysis: meaning overlaps across text Lexical diversity repetition of items that can indicate the type of text/argument: temporal elements (narration), causal elements (cause&effect)…

slide-12
SLIDE 12

Cohmetrix indices of the two texts

67 SYNLE SYNLE 5 4.75 Left embeddedness, words before main verb, mean 68 SYNNP SYNNP 1 1 Number of modifiers per noun phrase, mean 69 SYNMEDpos 0.715 0.584 Minimal Edit Distance, part of speech 70 SYNMEDwrd 0.902 0.877 Minimal Edit Distance, all words 71 SYNMEDlem 0.878 0.860 Minimal Edit Distance, lemmas 72 SYNSTRUTa 0.120 0.076 Sentence syntax similarity, adjacent sentences, mean 73 SYNSTRUTt 0.117 0.106 Sentence syntax similarity, all combinations, across paragraphs, mean 74 DRNP 350.785 393.204 Noun phrase density, incidence 75 DRVP 225.131 155.340 Verb phrase density, incidence 76 DRAP 47.120 43.689 Adverbial phrase density, incidence 77 DRPP 115.183 150.485 Preposition phrase density, incidence 78 DRPVAL AGLSPSVi. 15.707 9.709 Agentless passive voice density, incidence 79 DRNEG DENNEGi 4.854 Negation density, incidence 80 DRGERUND 26.178 24.272 Gerund density, incidence 81 DRINF INFi 20.942 9.709 Infinitive density, incidence 82 WRDNOUN 287.958 320.388 Noun incidence 83 WRDVERB 115.184 126.214 Verb incidence 84 WRDADJ 99.476 106.796 Adjective incidence 85 WRDADV 78.534 77.670 Adverb incidence 86 WRDPRO 26.178 19.417 2Pronoun incidence 87 WRDPRP1s First person singular pronoun incidence 88 WRDPRP1p First person plural pronoun incidence 89 WRDPRP2 20.942 Second person pronoun incidence 90 WRDPRP3s Third person singular pronoun incidence 91 WRDPRP3p 19.417 Third person plural pronoun incidence

12 subject and NP complexity syntactic similarity phrase complexity phrase incidence

slide-13
SLIDE 13

Cohmetrix indices of the two texts

92 WRDFRQc 2.060 1.901 CELEX word frequency for content words, mean 93 WRDFRQa 2.858 2.803 CELEX Log frequency for all words, mean 94 WRDFRQmc 0.885 CELEX Log minimum frequency for content words, mean 95 WRDAOAc 384.833 326.188 Age of acquisition for content words, mean 96 WRDFAMc 564.783 542.324 Familiarity for content words, mean 97 WRDCNCc 359.018 430.732 Concreteness for content words, mean 98 WRDIMGc 393.367 450.784 Imagability for content words, mean 99 WRDMEAc 404.784 423.431 Meaningfulness, Colorado norms, content words, mean 100 WRDPOLc 3.616 3.272 Polysemy for content words, mean 101 WRDHYPn 7.385 5.711 Hypernymy for nouns, mean 102 WRDHYPv 1.827 1.948 Hypernymy for verbs, mean 103 WRDHYPnv 2.273 1.923 Hypernymy for nouns and verbs, mean 104 RDFRE 56.304 35.356 Flesch Reading Ease 105 RDFKGL 9.720 14.725 Flesch-Kincaid Grade level 106 RDL2 10.687 5.5 Coh-Metrix L2 Readability

13 word frequency

  • ther word

difficulty properties

readability

slide-14
SLIDE 14

What are you talking about?

14

slide-15
SLIDE 15

USEFUL INDICES

  • Some indices from Coh-metrix
  • Lexile
  • Vocabprofiler

to check similarities and differences between texts we use for one particular assessment purpose or across different purposes with consistency to develop teaching materials to determine anchor points along the curriculum (A2-B1 texts…) to adapt texts from different sources for reading exams

15

slide-16
SLIDE 16

COH-METRIX

16

slide-17
SLIDE 17

COMPARISON OF TEXTS -LEXTUTOR

Literacy Assessment

  • K1:

76.31%

  • K1+K2:

81.54%

  • AWL:

10.46%

  • Type-token ratio:

0.54

  • Lexile level:

1440

  • Flesch Reading Ease:

29

  • Flesch-Kincaid Grade Level: 16
  • Coh-metrix Readability: 11.76
  • Narrativity:

38%

  • Syntactic Similarity:

14%

  • Word Concreteness:

14%

  • Referential Cohesion:

69%

  • Deep Cohesion:

83%

What is Society?

  • K1:

81.19%

  • K1+K2:

86.14%

  • AWL:

9.41%

  • Type-token ratio:

0.59

  • Lexile level:

1000

  • Flesch Reading Ease:

37.55

  • Flesch-Kincaid Grade Level:12
  • Coh-metrix Readability: 14.47
  • Narrativity:

3%

  • Syntactic Similarity:

53%

  • Word Concreteness:

35%

  • Referential Cohesion:

32%

  • Deep Cohesion:

17%

17

slide-18
SLIDE 18

Indices

  • K1:
  • K1+K2:
  • AWL:
  • Type-token ratio:
  • Lexile level:
  • Flesch Reading Ease:
  • Flesch-Kincaid Grade Level:
  • Coh-metrix Readability:
  • Narrativity:
  • Syntactic Similarity:
  • Word Concreteness:
  • Referential Cohesion:
  • Deep Cohesion:

18

How do we set the standards?

slide-19
SLIDE 19
  • We can form our own corpus based on:

course materials; reading texts, course books, target level materials (university course books, graded readers…), already set criteria (research, CEFR equivalence…)

19

slide-20
SLIDE 20
  • McNamara et al (2014):Automated Evaluation of Text and Discourse with Coh-Metrix:

Cambridge, CUP 20

slide-21
SLIDE 21

CEFR Levels (?) From: https://linguapress.com/teachers/flesch- kincaid.htm

21 Flesh Reading Ease CEFR Level 50+ C2 (Higher Education) 50-60 C1 60-70 B2 70-80 B1 80-90 A2 90-100 A1

slide-22
SLIDE 22

Bogazici University English Proficiency Test Specifications

Search Reading

Purpose: Search reading: Response Method: Item characteristics: Number of items: Weighting: Discourse mode: (Genre and rhetorical structure) : Nature of information: Content knowledge: Cultural knowledge Lexical and structural properties: Text length: 2100 – 2200 words Sentence length: 18 – 20 words Word length: 5 – 5,2 characters % Passive: 20 – 25% Flesch Reading Ease: 40 – 42 FK Grade Level: 12 Cohmetrix Read.: 10 - 11 Lexile Level: 1150 – 1250 K1: 75 – 78% K1+K2: 80-85% AWL: 10% Narrativity: <30% (but varies, i.e. historical texts) Syntactic simplicity: 60 – 65% Word concreteness: varies according to subject matter (i.e. high in historical texts) Referential cohesion: 30 – 40% Deep cohesion: 75 – 85% Test administration procedure: Total duration of the test:

Careful Reading

Purpose: Careful reading at global level: Response Method: Item characteristics: Number of items: Weighting: Discourse mode (Genre and rhetorical structure): Nature of information Content knowledge: Cultural knowledge: Lexical and structural properties: Text length: 1900-1930 words Sentence length: 20-21 words Word length: 5-5,2 characters % Passive: 20-25% Flesch Reading Ease: 40-42 FK Grade Level: 12-13 Cohmetrix Read.: 12-13 Lexile Level: 1200-1300 K1: 75-80% K1+K2: 80-82% AWL: 10-12% Narrativity: <30% (can change) Syntactic simplicity: 50-55% Word concreteness: varies according to subject matter Referential cohesion: 30-40% Deep cohesion: 75-85% Test administration procedure: Total duration of the test:

22

slide-23
SLIDE 23

THANK YOU