[PPT] - Susan Lyons & Scott Marion Center for Assessment CCSSOs NCSA PowerPoint Presentation

SLIDE 1

Comparability Evaluation Options for the Innovative Assessment and Accountability Demonstration Authority

Susan Lyons & Scott Marion

Center for Assessment

CCSSO’s NCSA 2017

June 28, 2017

SLIDE 2

Project Goals

1.Articulate a framework for comparability for the Demonstration Authority under ESSA 2.Expand the comparability options in draft regulations 3.Support states in planning innovative assessment pilots Thank you to the William and Flora Hewlett Foundation funding of this work.

2 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 3

Innovative Assessment and Accountability

Allows for a pilot for up to seven (7) states to use

competency-based or other innovative assessment approaches for use in making accountability determinations

Initial demonstration period of three (3) years with a two (2)

year extension based on satisfactory report from the director

f Institute for Education Sciences (IES), plus a potential 2

year waiver

Rigorous assessment, participation, and reporting

requirements and subject to a peer review process

Maybe used with a subset of districts based on strict

“guardrails,” with a plan to move statewide by end of extension

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 3

SLIDE 4

Approved states may pilot with a subset of

districts before scaling the system statewide by the end of the Demonstration Authority.

May Pilot in a Subset

f Districts
Approved states may design an assessment or

system of assessments that consists of all performance tasks, portfolios, or extended learning tasks.

Can Be Entirely Performance-Based

Approved states may assess students when

they are ready to demonstrate mastery of standards and competencies as applicable so long as states can also report grade-level information.

Can Administer when Students are Ready

What does “innovative” mean?

Innovative Assessment and Accountability

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 4

SLIDE 5

Purpose of ESEA

“From the beginning, Title 1 of ESEA included assessment and accountability requirements as a safeguard to ensure that the federal money being allocated to programs to improve the achievement of the disadvantaged was being spent wisely.” (DePascale, 2015) – The purpose of ESEA accountability is to ensure that public tax dollars are resulting in improved educational programming and the intended student

utcomes related to achievement and equity (Bailey

& Mosher, 1968).

Page 5 • Lyons • NCME 2017 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 6

Why Should We Care About Comparability?

1. Fairness: Because states

must use assessment results from the pilot districts in the state accountability system.

2. Equity in Opportunity to

Learn: Make sure that the pilot districts are not getting a “hall pass”, all students are held to same expectations.

6 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 7

Too Narrow of a Focus on Comparability A narrow focus on pilot to non-pilot comparability misses the bigger picture in two important ways:

–by failing to address additional, and potentially more important, comparability questions, and –by potentially inhibiting innovation.

7 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 8

Building an Evidence-Base for Score Comparability

Scoring calibration sessions, external audits on inter-rater reliability, audits on the generalizability of the local scores, reviews of local assessment quality and alignment. Social moderation comparability audits on common and local tasks, standard setting, and validating pilot performance standards with samples of student work. Common achievement level descriptors and common assessments in select grades/subjects.

Comparable Annual Determinations Pilot Results District A Results Within District Results District B Results Within District Results Non-pilot Results

The focus of the regulations

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 8

SLIDE 9

Threat to Real Innovation

Legitimate reasons for non-comparability:

1. To measure the state-defined learning targets more

efficiently (e.g., reduced testing time);

2. To measure the learning targets more flexibly (e.g., when

students are ready to demonstrate “mastery”);

3. To measure the learning targets more deeply; or
4. To measure targets more completely (e.g., listening,

speaking, extended research, scientific investigations).

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 9

SLIDE 10

Threat to Real Innovation

Legitimate reasons for non-comparability:

1. To measure the state-defined learning targets more

efficiently (e.g., reduced testing time);

2. To measure the learning targets more flexibly (e.g., when

students are ready to demonstrate “mastery”);

3. To measure the learning targets more deeply; or
4. To measure targets more completely (e.g., listening,

speaking, extended research, scientific investigations).

“Perfect agreement would be an indication of failure.” – Dr. Robert Brennan

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 10

SLIDE 11

Comparability by Design

How does the design of the innovative assessment system yield evidence to support comparability claims? How will the state evaluate the degree of comparability achieved across differing assessment conditions? If comparability is not achieved, how will the state adjust the classification scale to account for systematic differences across assessment systems?

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 11

SLIDE 12

Comparability by Design

How does the design of the innovative assessment system yield evidence to support comparability claims? How will the state evaluate the degree of comparability achieved across differing assessment conditions? If comparability is not achieved, how will the state adjust the classification scale to account for systematic differences across assessment systems?

The focus of the regulations

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 12

SLIDE 13

What’s Our Inference?

Many comparability studies focus on item- and

score-level interchangeability

The innovative pilot requires comparability at

the level of the annual determination

– In other words, would a student considered proficient in one district also be considered proficient in another district given the same level of work?

13 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 14

Expanding our notions of comparability

14

Adapted from Winter (2010)

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 15

Two Major Categories of Evidence

1. The alignment of the assessment systems to the

content standards

– We strongly recommend that evidence of alignment for the two assessment systems should come from alignment to the content standards rather than alignment to one another.

2. The consistency of achievement classifications

across the two systems.

15 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 16

Two Major Categories of Evidence

1. The alignment of the assessment systems to the

content standards

– We strongly recommend that evidence of alignment for the two assessment systems should come from alignment to the content standards rather than alignment to one another.

2. The consistency of achievement classifications

across the two systems.

16 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

The focus of the regulations

SLIDE 17

Comparability Options in the Regulations

17

Administering both the innovative and statewide

assessments to all students in pilot schools at least

nce in any grade span

Audit

Administering full assessments from both the innovative

and statewide assessment system to a demographically representative sample of students at least once every grade span

Sample

Including common items in both the statewide

and innovative assessment system

Common Items

This is where we come in. We

needed to offer additional options!

Other

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 18

16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards

All Students Some Students No Students in Common Both Measures Some Measures Third Measure in Common Other

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 18

SLIDE 19

All Students Some Students No Students in Common Both Measures

Concurrent (in past): “Pre-equating”

Some Measures

Concurrent: Embedded common items across both systems

Third Measure in Common

Concurrent: Common independent assessment

Other

16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 19

SLIDE 20

All Students Some Students No Students in Common Both Measures

Concurrent (in past): “Pre-equating” Not Concurrent: Statewide assessment

nce per grade span in

lieu of innovative assessment

Some Measures

Concurrent: Embedded common items across both systems 

Third Measure in Common

Concurrent: Common independent assessment



Other

16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 20

SLIDE 21

All Students Some Students No Students in Common Both Measures

Concurrent (in past): “Pre-equating” Not Concurrent: Statewide assessment

nce per grade span in

lieu of innovative assessment Concurrent: Random assignment of assessment system to classrooms

Some Measures

Concurrent: Embedded common items across both systems 

Third Measure in Common

Concurrent: Common independent assessment



Concurrent: Propensity score matching

Other

Concurrent: Standard setting design

16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 21

SLIDE 22

How Comparable is Comparable Enough?

22

Do the differences exceed in magnitude those that are typically seen within assessment programs due to variations in administration conditions? Do the differences pose a significant threat to the validity

f the accountability system? Do the differences pose a

significant threat to equity in opportunity to learn? Do the results potentially disadvantage specific subgroups or institutions? Is the disadvantage consequential enough that it is not

ffset by potential gains in other important dimensions

that might justify that loss (e.g., positive impact on teaching and learning)?

If YES If YES If YES

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 23

So, did ED listen to us?

Comment: Clarify that not every assessment within an innovative assessment system meet the peer review guidelines but that there is sufficient validity evidence to support the annual determinations resulting from the assessment system for their intended uses. Changes by ED: Clarification made!

23 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 24

So, did ED listen to us?

Comment: Clarify that comparability be established at the level of the summative annual determinations, not at the raw or scale score levels. Changes by ED: Clarification made!

24 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 25

So, did ED listen to us?

Comment: In addition to evidence of consistency in performance classifications, states should be required to submit evidence of alignment to the content standards as part of their comparability argument. Changes by ED: No changes, ED feels the regulations as written provide sufficient clarity that the innovative system must be aligned to the content standards.

25 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 26

So, did ED listen to us?

Comment: As the system scales statewide, comparability among pilot districts becomes much more relevant than comparability from pilot to non-pilot districts. Changes by ED: Added a regulation to require that the innovative assessment system generate results that are comparable among pilot schools and LEA.

26 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 27

So, did ED listen to us?

Comment: Provide a multitude of examples of comparability designs in non-regulatory guidance instead of the regulations, allow a State to develop an evaluation methodology for establishing comparability that is consistent with the design and context of its innovative assessment. Changes by ED: ED feels the regulations as written provide sufficient flexibility for states to pursue alternate methods of gathering comparability evidence, but they did clarify one of their listed methods and add an additional method.

27 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 28

So, did ED listen to us?

Comment: Once strong evidence of comparability is established across assessment systems, it does not need to be re-established annually unless either of the two systems changes. Changes by ED: ED does not feel it is

verly burdensome to demonstrate

comparability annually as the system scales statewide.

28 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 29

Where are we now?

ESSA says that the “Secretary may” release an

application for the demonstration authority

Regulations are still in place (have not been rescinded)
ED has indicated that they will not release an

application until next year

States do not appear to be clamoring to apply:

– Concerns about scaling statewide – Concerns about technical requirements – Concerns about resources and capacity

29 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

SLIDE 30

External Experts

Center for Assessment

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

Bob Brennan, U of Iowa
Randy Bennett, ETS
Henry Braun, B.C.
Derek Briggs, U of CO
Linda Cook, ETS, retired
Joan Herman, CRESST
Stuart Kahl, Measured Progress
Ric Luecht, U of NC
Laurie Wise, HumRRO
Scott Marion
Susan Lyons
Nathan Dadey
Juan D’Brot
Chris Domaleski
Erika Hall
Joseph Martineau

30