Susan Lyons & Scott Marion Center for Assessment CCSSOs NCSA - - PowerPoint PPT Presentation
Susan Lyons & Scott Marion Center for Assessment CCSSOs NCSA - - PowerPoint PPT Presentation
Comparability Evaluation Options for the Innovative Assessment and Accountability Demonstration Authority Susan Lyons & Scott Marion Center for Assessment CCSSOs NCSA 2017 June 28, 2017 Project Goals 1.Articulate a framework for
Project Goals
1.Articulate a framework for comparability for the Demonstration Authority under ESSA 2.Expand the comparability options in draft regulations 3.Support states in planning innovative assessment pilots Thank you to the William and Flora Hewlett Foundation funding of this work.
2 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
Innovative Assessment and Accountability
- Allows for a pilot for up to seven (7) states to use
competency-based or other innovative assessment approaches for use in making accountability determinations
- Initial demonstration period of three (3) years with a two (2)
year extension based on satisfactory report from the director
- f Institute for Education Sciences (IES), plus a potential 2
year waiver
- Rigorous assessment, participation, and reporting
requirements and subject to a peer review process
- Maybe used with a subset of districts based on strict
“guardrails,” with a plan to move statewide by end of extension
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 3
- Approved states may pilot with a subset of
districts before scaling the system statewide by the end of the Demonstration Authority.
May Pilot in a Subset
- f Districts
- Approved states may design an assessment or
system of assessments that consists of all performance tasks, portfolios, or extended learning tasks.
Can Be Entirely Performance-Based
- Approved states may assess students when
they are ready to demonstrate mastery of standards and competencies as applicable so long as states can also report grade-level information.
Can Administer when Students are Ready
What does “innovative” mean?
Innovative Assessment and Accountability
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 4
Purpose of ESEA
“From the beginning, Title 1 of ESEA included assessment and accountability requirements as a safeguard to ensure that the federal money being allocated to programs to improve the achievement of the disadvantaged was being spent wisely.” (DePascale, 2015) – The purpose of ESEA accountability is to ensure that public tax dollars are resulting in improved educational programming and the intended student
- utcomes related to achievement and equity (Bailey
& Mosher, 1968).
Page 5 • Lyons • NCME 2017 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
Why Should We Care About Comparability?
- 1. Fairness: Because states
must use assessment results from the pilot districts in the state accountability system.
- 2. Equity in Opportunity to
Learn: Make sure that the pilot districts are not getting a “hall pass”, all students are held to same expectations.
6 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
Too Narrow of a Focus on Comparability A narrow focus on pilot to non-pilot comparability misses the bigger picture in two important ways:
–by failing to address additional, and potentially more important, comparability questions, and –by potentially inhibiting innovation.
7 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
Building an Evidence-Base for Score Comparability
Scoring calibration sessions, external audits on inter-rater reliability, audits on the generalizability of the local scores, reviews of local assessment quality and alignment. Social moderation comparability audits on common and local tasks, standard setting, and validating pilot performance standards with samples of student work. Common achievement level descriptors and common assessments in select grades/subjects.
Comparable Annual Determinations Pilot Results District A Results Within District Results District B Results Within District Results Non-pilot Results
The focus of the regulations
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 8
Threat to Real Innovation
Legitimate reasons for non-comparability:
- 1. To measure the state-defined learning targets more
efficiently (e.g., reduced testing time);
- 2. To measure the learning targets more flexibly (e.g., when
students are ready to demonstrate “mastery”);
- 3. To measure the learning targets more deeply; or
- 4. To measure targets more completely (e.g., listening,
speaking, extended research, scientific investigations).
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 9
Threat to Real Innovation
Legitimate reasons for non-comparability:
- 1. To measure the state-defined learning targets more
efficiently (e.g., reduced testing time);
- 2. To measure the learning targets more flexibly (e.g., when
students are ready to demonstrate “mastery”);
- 3. To measure the learning targets more deeply; or
- 4. To measure targets more completely (e.g., listening,
speaking, extended research, scientific investigations).
“Perfect agreement would be an indication of failure.” – Dr. Robert Brennan
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 10
Comparability by Design
How does the design of the innovative assessment system yield evidence to support comparability claims? How will the state evaluate the degree of comparability achieved across differing assessment conditions? If comparability is not achieved, how will the state adjust the classification scale to account for systematic differences across assessment systems?
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 11
Comparability by Design
How does the design of the innovative assessment system yield evidence to support comparability claims? How will the state evaluate the degree of comparability achieved across differing assessment conditions? If comparability is not achieved, how will the state adjust the classification scale to account for systematic differences across assessment systems?
The focus of the regulations
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 12
What’s Our Inference?
- Many comparability studies focus on item- and
score-level interchangeability
- The innovative pilot requires comparability at
the level of the annual determination
– In other words, would a student considered proficient in one district also be considered proficient in another district given the same level of work?
13 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
Expanding our notions of comparability
14
Adapted from Winter (2010)
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
Two Major Categories of Evidence
- 1. The alignment of the assessment systems to the
content standards
– We strongly recommend that evidence of alignment for the two assessment systems should come from alignment to the content standards rather than alignment to one another.
- 2. The consistency of achievement classifications
across the two systems.
15 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
Two Major Categories of Evidence
- 1. The alignment of the assessment systems to the
content standards
– We strongly recommend that evidence of alignment for the two assessment systems should come from alignment to the content standards rather than alignment to one another.
- 2. The consistency of achievement classifications
across the two systems.
16 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
The focus of the regulations
Comparability Options in the Regulations
17
- Administering both the innovative and statewide
assessments to all students in pilot schools at least
- nce in any grade span
Audit
- Administering full assessments from both the innovative
and statewide assessment system to a demographically representative sample of students at least once every grade span
Sample
- Including common items in both the statewide
and innovative assessment system
Common Items
- This is where we come in. We
needed to offer additional options!
Other
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards
All Students Some Students No Students in Common Both Measures Some Measures Third Measure in Common Other
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 18
All Students Some Students No Students in Common Both Measures
Concurrent (in past): “Pre-equating”
Some Measures
Concurrent: Embedded common items across both systems
Third Measure in Common
Concurrent: Common independent assessment
Other
16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 19
All Students Some Students No Students in Common Both Measures
Concurrent (in past): “Pre-equating” Not Concurrent: Statewide assessment
- nce per grade span in
lieu of innovative assessment
Some Measures
Concurrent: Embedded common items across both systems
Third Measure in Common
Concurrent: Common independent assessment
Other
16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 20
All Students Some Students No Students in Common Both Measures
Concurrent (in past): “Pre-equating” Not Concurrent: Statewide assessment
- nce per grade span in
lieu of innovative assessment Concurrent: Random assignment of assessment system to classrooms
Some Measures
Concurrent: Embedded common items across both systems
Third Measure in Common
Concurrent: Common independent assessment
Concurrent: Propensity score matching
Other
Concurrent: Standard setting design
16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 21
How Comparable is Comparable Enough?
22
Do the differences exceed in magnitude those that are typically seen within assessment programs due to variations in administration conditions? Do the differences pose a significant threat to the validity
- f the accountability system? Do the differences pose a
significant threat to equity in opportunity to learn? Do the results potentially disadvantage specific subgroups or institutions? Is the disadvantage consequential enough that it is not
- ffset by potential gains in other important dimensions
that might justify that loss (e.g., positive impact on teaching and learning)?
If YES If YES If YES
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
So, did ED listen to us?
Comment: Clarify that not every assessment within an innovative assessment system meet the peer review guidelines but that there is sufficient validity evidence to support the annual determinations resulting from the assessment system for their intended uses. Changes by ED: Clarification made!
23 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
So, did ED listen to us?
Comment: Clarify that comparability be established at the level of the summative annual determinations, not at the raw or scale score levels. Changes by ED: Clarification made!
24 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
So, did ED listen to us?
Comment: In addition to evidence of consistency in performance classifications, states should be required to submit evidence of alignment to the content standards as part of their comparability argument. Changes by ED: No changes, ED feels the regulations as written provide sufficient clarity that the innovative system must be aligned to the content standards.
25 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
So, did ED listen to us?
Comment: As the system scales statewide, comparability among pilot districts becomes much more relevant than comparability from pilot to non-pilot districts. Changes by ED: Added a regulation to require that the innovative assessment system generate results that are comparable among pilot schools and LEA.
26 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
So, did ED listen to us?
Comment: Provide a multitude of examples of comparability designs in non-regulatory guidance instead of the regulations, allow a State to develop an evaluation methodology for establishing comparability that is consistent with the design and context of its innovative assessment. Changes by ED: ED feels the regulations as written provide sufficient flexibility for states to pursue alternate methods of gathering comparability evidence, but they did clarify one of their listed methods and add an additional method.
27 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
So, did ED listen to us?
Comment: Once strong evidence of comparability is established across assessment systems, it does not need to be re-established annually unless either of the two systems changes. Changes by ED: ED does not feel it is
- verly burdensome to demonstrate
comparability annually as the system scales statewide.
28 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
Where are we now?
- ESSA says that the “Secretary may” release an
application for the demonstration authority
- Regulations are still in place (have not been rescinded)
- ED has indicated that they will not release an
application until next year
- States do not appear to be clamoring to apply:
– Concerns about scaling statewide – Concerns about technical requirements – Concerns about resources and capacity
29 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
External Experts
Center for Assessment
Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017
- Bob Brennan, U of Iowa
- Randy Bennett, ETS
- Henry Braun, B.C.
- Derek Briggs, U of CO
- Linda Cook, ETS, retired
- Joan Herman, CRESST
- Stuart Kahl, Measured Progress
- Ric Luecht, U of NC
- Laurie Wise, HumRRO
- Scott Marion
- Susan Lyons
- Nathan Dadey
- Juan D’Brot
- Chris Domaleski
- Erika Hall
- Joseph Martineau
30