Testing-Teaching Mismatches
The companies that create and sell standardized achievement tests are all owned by large corporations. Like all for-profit businesses, these corporations attempt to produce revenue for their shareholders.
Recognizing the substantial pressure to sell standardized achievement tests, those who market such tests encounter a difficult dilemma that arises from the considerable curricular diversity in the United States. Because different states often choose somewhat different educational objectives (or, to be fashionable, different content standards), the need exists to build standardized achievement tests that are properly aligned with educators' meaningfully different curricular preferences. The problem becomes even more exacerbated in states where different counties or school districts can exercise more localized curricular decision making.
At a very general level, the goals that educators pursue in different settings are reasonably similar. For instance, you can be sure that all schools will give attention to language arts, mathematics, and so on. But that's at a general level. At the level where it really makes a difference to instruction—in the classroom—there are significant differences in the educational objectives being sought. And that presents a problem to those who must sell standardized achievement tests.
In view of the nation's substantial curricular diversity, test developers are obliged to create a series of one-size-fits-all assessments. But, as most of us know from attempting to wear one-size-fits-all garments, sometimes one size really can't fit all.
The designers of these tests do the best job they can in selecting test items that are likely to measure all of a content area's knowledge and skills that the nation's educators regard as important. But the test developers can't really pull it off. Thus, standardized achievement tests will always contain many items that are not aligned with what's emphasized instructionally in a particular setting.
To illustrate the seriousness of the mismatch that can occur between what's taught locally and what's tested through standardized achievement tests, educators ought to know about an important study at Michigan State University reported in 1983 by Freeman and his colleagues. These researchers selected five nationally standardized achievement tests in mathematics and studied their content for grades 4–6. Then, operating on the very reasonable assumption that what goes on instructionally in classrooms is often influenced by what's contained in the texbooks that children use, they also studied four widely used textbooks for grades 4-6.
Employing rigorous review procedures, the researchers identified the items in the standardized achievement test that had not received meaningful instructional attention in the textbooks. They concluded that between 50 and 80 percent of what was measured on the tests was not suitably addressed in the textbooks. As the Michigan State researchers put it, "The proportion of topics presented on a standardized test that received more than cursory treatment in each textbook was never higher than 50 percent" (p. 509).
Well, if the content of standardized tests is not satisfactorily addressed in widely used textbooks, isn't it likely that in a particular educational setting, topics will be covered on the test that aren't addressed instructionally in that setting? Unfortunately, because most educators are not genuinely familiar with the ingredients of standardized achievement tests, they often assume that if a standardized achievement test asserts that it is assessing "children's reading comprehension capabilities," then it's likely that the test meshes with the way reading is being taught locally. More often than not, the assumed match between what's tested and what's taught is not warranted.
If you spend much time with the descriptive materials presented in the manuals accompanying standardized achievement tests, you'll find that the descriptors for what's tested are often fairly general. Those descriptors need to be general to make the tests acceptable to a nation of educators whose curricular preferences vary. But such general descriptions of what's tested often permit assumptions of teaching-testing alignments that are way off the mark. And such mismatches, recognized or not, will often lead to spurious conclusions about the effectiveness of education in a given setting if students' scores on standardized achievement tests are used as the indicator of educational effectiveness. And that's the first reason that standardized achievement tests should not be used to determine the effectiveness of a state, a district, a school, or a teacher. There's almost certain to be a significant mismatch between what's taught and what's tested.
A Psychometric Tendency to Eliminate Important Test Items
A second reason that standardized achievement tests should not be used to evaluate educational quality arises directly from the requirement that these tests permit meaningful comparisons among students from only a small collection of items.
A test item that does the best job in spreading out students' total-test scores is a test item that's answered correctly by about half the students. Items that are answered correctly by 40 to 60 percent of the students do a solid job in spreading out the total scores of test-takers.
Items that are answered correctly by very large numbers of students, in contrast, do not make a suitable contribution to spreading out students' test scores. A test item answered correctly by 90 percent of the test-takers is, from the perspective of a test's efficiency in providing comparative interpretations, being answered correctly by too many students.
Test items answered correctly by 80 percent or more of the test takers, therefore, usually don't make it past the final cut when a standardized achievement test is first developed, and such items will most likely be jettisoned when the test is revised. As a result, the vast majority of the items on standardized achievement tests are "middle difficulty" items.
As a consequence of the quest for score variance in a standardized achievement test, items on which students perform well are often excluded. However, items on which students perform well often cover the content that, because of its importance, teachers stress. Thus, the better the job that teachers do in teaching important knowledge and/or skills, the less likely it is that there will be items on a standardized achievement test measuring such knowledge and/or skills. To evaluate teachers' instructional effectiveness by using assessment tools that deliberately avoid important content is fundamentally foolish.