9 Validity Studies

The preceding chapters and the Dynamic Learning Maps^® (DLM^®) Alternate Assessment System 2015–2016 Technical Manual—Science (Dynamic Learning Maps Consortium, 2017) provide evidence in support of the overall validity argument for results produced by the DLM assessment. This chapter presents additional evidence collected during 2020–2021 for two of the five critical sources of evidence described in Standards for Educational and Psychological Testing (American Educational Research Association et al., 2014): evidence based on test content and response process. Additional evidence can be found in Chapter 9 of the 2015–2016 Technical Manual—Science (Dynamic Learning Maps Consortium, 2017) and the subsequent annual technical manual updates (Dynamic Learning Maps Consortium, 2017, 2018a, 2018b, 2019, 2020).

9.1 Evidence Based on Test Content

Evidence based on test content relates to the evidence “obtained from an analysis of the relationship between the content of the test and the construct it is intended to measure” (American Educational Research Association et al., 2014, p. 14). This section presents results from data collected during spring 2021 regarding student opportunity to learn the assessed content. For additional evidence based on test content, including the alignment of test content to content standards via the DLM maps (which underlie the assessment system), see Chapter 9 of the 2015–2016 Technical Manual—Science (Dynamic Learning Maps Consortium, 2017).

9.1.1 Opportunity to Learn

After administration of the spring 2021 operational assessments, teachers were invited to complete a survey about the assessment (see Chapter 4 of this manual for more information on recruitment and response rates). The survey included four blocks of items. The first, third, and fourth blocks were fixed forms assigned to all teachers. For the second block, teachers received one randomly assigned section.

The first block of the survey served several purposes.⁷ Results for other survey items are reported later in this chapter and in Chapter 4 in this manual. One item provided information about the relationship between students’ learning opportunities before testing and the test content (i.e., testlets) they encountered on the assessment. The survey asked teachers to indicate the extent to which they judged test content to align with their instruction across all testlets. Table 9.1 reports the results. Approximately 51% of responses (n = 8,866) reported that most or all science testlets matched instruction. More specific measures of instructional alignment are planned to better understand the extent that content measured by DLM assessments matches students’ academic instruction.

Table 9.1: Teacher Ratings of Portion of Testlets That Matched Instruction
	None		Some (< half)		Most (> half)		All		Not applicable
Subject	n	%	n	%	n	%	n	%	n	%
Science	1,646	9.4	5,331	30.6	5,789	33.2	3,077	17.7	1,576	9.0

The second block of the survey was randomly spiraled so that teachers received one randomly assigned section. In one of the randomly assigned sections, a subset of teachers were asked to indicate the approximate number of hours they spent instructing students on each of the DLM science core ideas and in the science and engineering practices. Teachers responded using a 5-point scale: 0–5 hours, 6–10 hours, 11–15 hours, 16–20 hours, or more than 20 hours. Table 9.2 and Table 9.3 indicate the amount of instructional time spent on DLM science core ideas and science and engineering practices, respectively. For all science core ideas and science and engineering practices, the most commonly selected response was 0–5 hours.

Table 9.2: Instructional Time Spent on Science Core Ideas
	Number of hours
	0–5		6–10		11–15		16–20		>20
Core Idea	n	%	n	%	n	%	n	%	n	%
Physical Science
Matter and its interactions	1,626	57.1	579	20.3	295	10.4	203	7.1	145	5.1
Motion and stability: Forces and interactions	1,648	58.3	576	20.4	286	10.1	193	6.8	126	4.5
Energy	1,573	56.0	592	21.1	297	10.6	219	7.8	129	4.6
Life Science
From molecules to organisms: Structures and processes	1,656	58.9	498	17.7	313	11.1	208	7.4	137	4.9
Ecosystems: Interactions, energy, and dynamics	1,268	44.8	641	22.6	400	14.1	309	10.9	213	7.5
Heredity: Inheritance and variation of traits	1,739	61.9	486	17.3	294	10.5	178	6.3	111	4.0
Biological evolution: Unity and diversity	1,658	59.3	523	18.7	303	10.8	194	6.9	118	4.2
Earth and Space Science
Earth’s place in the universe	1,393	49.4	601	21.3	396	14.0	265	9.4	167	5.9
Earth’s systems	1,366	48.6	634	22.5	384	13.7	250	8.9	179	6.4
Earth and human activity	1,269	44.8	628	22.2	440	15.5	287	10.1	207	7.3

Table 9.3: Instructional Time Spent on Science and Engineering Practices
	Number of hours
	0–5		6–10		11–15		16–20		>20
Science and engineering practice	n	%	n	%	n	%	n	%	n	%
Developing and using models	1,624	57.0	568	20.0	299	10.5	203	7.1	153	5.4
Planning and carrying out investigations	1,367	48.2	670	23.6	362	12.8	243	8.6	197	6.9
Analyzing and interpreting data	1,261	44.5	673	23.7	411	14.5	261	9.2	228	8.0
Using mathematics and computational thinking	1,306	46.3	601	21.3	407	14.4	232	8.2	275	9.7
Constructing explanations and designing solutions	1,531	54.1	602	21.3	357	12.6	197	7.0	143	5.1
Engaging in argument from evidence	1,725	60.8	526	18.5	291	10.3	173	6.1	122	4.3
Obtaining, evaluating, and communicating information	1,255	44.2	648	22.8	411	14.5	242	8.5	285	10.0

Results from the teacher survey were also correlated with total linkage levels mastered by grade band. The median of instructional time was calculated for each student across from teacher responses at the core idea level. While a direct relationship between amount of instructional time and the total number of linkage levels mastered is not expected, as some students may spend a large amount of time on an area and demonstrate mastery at the lowest linkage level for each Essential Element, we generally expect that students who mastered more linkage levels would also have spent more time in instruction. More evidence is needed to evaluate this assumption.

Table 9.4 summarizes the Spearman rank-order correlations between instructional time and the total number linkage levels mastered, by grade band and course. Correlations ranged from -0.21 to 0.25. Based on guidelines from Cohen (1988), the observed correlations were small. However, the correlation for Biology is based on data from only 20 students who both participated in the Biology assessment and had this block of the teacher survey completed. Thus, these results should be interpreted with caution.

Table 9.4: Correlation Between Instuction Time in Science Linkage Levels Mastered
Grade band	Correlation with instructional time
Elementary	0.20
Middle school	0.25
High school	0.07
Biology	-0.21

The third block of the survey included questions about the student’s learning and assessment experiences during the 2020–2021 school year. During the COVID-19 pandemic, students may have been instructed in a variety of different instructional settings, which could have affected their opportunity to learn. Teachers were asked the percentage of time students spent in each instructional setting. Table 9.5 displays the possible settings and responses. A majority of responses indicated that students spent greater than 50% of the time in school. More than a third of responses indicated at least some time spent in home with direct instruction with the teacher (either one-on-one or as a class), in home with a family member providing instruction, and no formal instruction. Fewer responses indicated an instructional setting in home with the teacher present or an instructional setting other than the settings presented in the survey question.

Table 9.5: Percentage of Instruction Time Spent in Each Instructional Setting
	None		1–25		26–50		51–75		76–100		Unknown
Instructional setting	n	%	n	%	n	%	n	%	n	%	n	%
In school	2,263	6.2	3,856	10.5	5,025	13.7	7,587	20.7	17,272	47.2	591	1.6
Direct instruction with teacher remotely, 1:1	13,974	40.8	12,061	35.2	3,644	10.6	1,892	5.5	1,443	4.2	1,245	3.6
Direct instruction with teacher remotely, group	12,300	35.3	12,107	34.7	4,460	12.8	2,549	7.3	2,308	6.6	1,161	3.3
Teacher present in the home	30,011	89.7	923	2.8	446	1.3	327	1.0	366	1.1	1,378	4.1
Family member providing instruction	20,401	60.2	7,102	21.0	1,601	4.7	880	2.6	897	2.6	2,987	8.8
Absent (no formal instruction)	21,548	65.0	7,473	22.5	893	2.7	497	1.5	369	1.1	2,365	7.1
Other	22,886	80.5	618	2.2	251	0.9	205	0.7	282	1.0	4,177	14.7

Teachers were also asked what instructional scheduling scenarios applied to their student that year. Table 9.6 reports the possible instructional scheduling scenarios and teacher responses. The majority of teachers reported no delayed start of the school year, no lengthened spring semester, no extended school year through summer, and that change(s) between remote and in-person learning occurred during the school year.

Table 9.6: Instructional Scheduling Scenarios Around Student Schedules
	Yes		No		Unknown
Instructional setting	n	%	n	%	n	%
Delayed start of the school year	10,536	28.6	25,337	68.7	1,020	2.8
Lengthened spring semester	1,673	4.6	33,415	92.0	1,236	3.4
Extended school year through summer	13,433	36.7	21,010	57.3	2,193	6.0
Change(s) between remote and in-person learning during the school year	27,019	71.6	10,022	26.6	696	1.8

9.2 Evidence Based on Response Processes

The study of test takers’ response processes provides evidence about the fit between the test construct and the nature of how students actually experience test content (American Educational Research Association et al., 2014). The validity studies presented in this section include teacher survey data collected in spring 2021 regarding students’ ability to respond to testlets and a description of the test administration observations. For additional evidence based on response processes, including studies on student and teacher behaviors during testlet administration and evidence of fidelity of administration, see Chapter 9 of the 2015–2016 Technical Manual—Science (Dynamic Learning Maps Consortium, 2017).

9.2.1 Test Administration Observations

To be consistent with previous years, the DLM Consortium made a test administration observation protocol available for state and local users to gather information about how educators in the consortium states deliver testlets to students with the most significant cognitive disabilities. This protocol gave observers, regardless of their role or experience with DLM assessments, a standardized way to describe how DLM testlets were administered. The test administration observation protocol captured data about student actions (e.g., navigation, responding), educator assistance, variations from standard administration, engagement, and barriers to engagement. The observation protocol was used only for descriptive purposes; it was not used to evaluate or coach educators or to monitor student performance. Most items on the protocol were a direct report of what was observed, such as how the test administrator prepared for the assessment and what the test administrator and student said and did. One section of the protocol asked observers to make judgments about the student’s engagement during the session.

During 2020–2021, there were 218 test administration observations collected in four states. Because test administration observation data are anonymous and the sample of students available for test administration observations may not have represented the full population of students taking DLM assessments due to students completing assessments in a variety of locations (Table 9.6), we do not report the findings from those observations here as part of the assessment validity evidence.

9.3 Evidence Based on Internal Structure

Analyses of an assessment’s internal structure indicate the degree to which “relationships among test items and test components conform to the construct on which the proposed test score interpretations are based” (American Educational Research Association et al., 2014, p. 16).

One source of evidence comes from the examination of whether particular items function differently for specific subgroups (e.g., male versus female). The analysis of differential item functioning (DIF) is conducted annually for DLM assessments based on the cumulative operational data for the assessment. For example, in 2019—2020, the DIF analyses were based on data from the 2015—2016 through 2018—2019 assessments. Due to the cancellation of assessment during spring 2020, additional data for DIF analyses were not collected in 2019—2020. Thus, updated DIF analyses are not provided in this update, as there are no additional data to contribute to the analysis. For a description of DIF results from 2019—2020, see Chapter 9 of the 2019–2020 Technical Manual Update—Science (Dynamic Learning Maps Consortium, 2020).

Additional evidence based on internal structure is provided across the linkage levels that form the basis of reporting. This evidence is described in detail in Chapter 5 of this manual.

9.4 Conclusion

This chapter presents additional studies as evidence for the overall validity argument for the DLM Alternate Assessment System. The studies are organized into categories where available (content and response process), as defined by the Standards for Educational and Psychological Testing (American Educational Research Association et al., 2014), the professional standards used to evaluate educational assessments.

The final chapter of this manual, Chapter 11, references evidence presented through the technical manual, including Chapter 9, and expands the discussion of the overall validity argument. Chapter 11 also provides areas for further inquiry and ongoing evaluation of the DLM Alternate Assessment System, building on the evidence presented in the 2015–2016 Technical Manual—Science (Dynamic Learning Maps Consortium, 2017) and the subsequent annual technical manual updates (Dynamic Learning Maps Consortium, 2017, 2018a, 2018b, 2019, 2020), in support of the assessment’s validity argument.