DefinitionsHome PageBackgroundEvaluation the ResearchOpinions

Evaluating the Research

Standardized Testing... Do We Have Other Choices?


Is there a cultural bias?

When discussing cultural bias, it is necessary to distinguish between bias inherent in the test and disparities of outcome resulting from variables beyond the test makers control. These are issues of importance both to educators and to the public at large. The public needs to be confident that the test is designed to accommodate all students regardless of their race, creed or color. As educators, we need to ensure that our students, regardless of their personal backgrounds, receive the best education possible.

On a cautionary note, we must be careful when discussing bias. To say that a test is biased against a group of people because that group does poorly on that test is circular reasoning at best. A graduate level Microbiology test administered to Freshman Art History majors would clearly result in the majority of the students failing the test. This does not mean that the test was biased against Art History majors, it means that the individuals were not properly prepared to take this test. (This was a deliberately absurd example to make the point.) On the other side of the issue, just because a group has historically failed a test does not mean that they cannot pass the test if properly prepared to do so. The test may truly have been designed to fail a specific population. This is why we must separate out the issues of test bias and disparities in outcome.

In the Texas Student Assessment Program Technical Digest for the Academic Year 1998-1999, TEA discusses the steps taken to avoid bias during the development and review of potential test items.

 
Training: HEM [Harcourt Educational Measurement] provides extensive training for each [test] item writer prior to item development. During these training seminars, HEM reviews, in detail, the content objectives and their measurement specifications. In addition, HEM discusses the scope of the testing program, security issues, adherence to measurement specifications, and avoidance of economic, regional, cultural, and ethnic bias (Technical Digest, page 7).
Contractor Review:

Experienced HEM staff members, as well as content experts in the grades and subject areas for which the items were developed, participate in the review of each set of newly developed items. This review, which occurs annually for each new or ongoing test, checks for the fairness of the items regarding their depiction of minority, gender, and other demographic groups (Technical Digest, page 8)

TEA Review:

Staff from TEA and contractor personnel meet to discuss and review all newly developed items prior to each educator committee review. Their task during this review is to scrutinize each item for content-to-specification match, item difficulty plausibility of the distracters, and any potential ethnic, gender, economic, or cultural bias. (Technical Digest, page 8)

In spite of this repeated reference to how bias is avoided, there are those that are not satisfied that this is in fact the case. Walt Haney, in his paper titled "The Myth of the Texas Miracle in Education" discusses the adverse impact of the TAAS test on minority students. In this discussion he makes mention of the:

…three standards [that] have been recognized for determining whether observed differences constitute discriminatory disparate impact:

  1. the 80 percent (or four-fifths) rule;
  2. tests of the statistical significance of observed differences;
  3. and evaluation of the practical significance of differences. (Haney)

Haney then goes on to show that when using all three of these measurements, the TAAS test fails. The percent of minority students who passed the Exit Level TAAS from 1994 through 1998 was less than 80 percent of the percent of white students who passed. Additionally, using the tests of statistical significance of observed differences there is a significant disparity. To test statistical significance of observed difference, the actual difference in passing percent is divided by the standard error of the differences. The results of these calculations are shown in the following table from Haney’s paper.

Finally, in evaluating the practical significance of differences, Haney states, “A test that leads to failure of tens of thousands more minority than non-minority students, had they had equivalent passing rates, surely has practical adverse impact” (Haney).

Haney also argues that the TAAS passing scores are arbitrary and discriminatory. He argues that the recommended passing score of 70% is arbitrary and based largely on historical precedence and, “…not based on any of the professionally recognized methods for setting passing standards on tests” (Haney). Additionally this requirement creates a passing score o the TAAS exit test which maximizes the adverse impact on Black and Hispanic students. On the charge of arbitrariness, it is hard to argue. Why is 70% preferred over 50% or 80%? There is no good answer. One should note that some number must be chosen, since without a predetermined passing level the test would serve no purpose whatsoever, but there seems no evidence as to why this passing score. On the issue of maximized impact, Haney relies on the following graphs from the TAAS Field Test Results. He asked individuals to determine where the greatest difference was between the number of questions answered correctly by White students as compared to Black and Hispanic students.

 

However, in using graphs rather than using the actual numbers there is a degree of uncertainty introduced. This makes his question a subjective one to answer rather than what could have been an objective one.

Haney then goes on to say that:

"If the intent in setting the passing scores based on the TAAS field test results in July 1990 had been discriminatory, i.e., to set the passing scores so that they would most clearly differentiate between White students and Black/Hispanic students, then the passing scores would have been set just about where the Board of Education did in fact set them (Haney).

Haney does admit that he found no evidence of discriminatory intent, however he does still hold the Board of Education responsible for the disparity of outcome.

When looking at the disparities between White and Black/Hispanic students one must take into account that poor, largely minority, and Low English Proficient (LEP) students come to the TAAS test at a disadvantage. This is not to say there is a bias against this group, only that they are less prepared for the test. Thus many argue that the TAAS can adversely affected their education, rather than improve it.

This was supported in a paper published by the nonpartisan Texas Public Policy Foundation (TPPF). Yet, while they say that they are nonpartisan, their mission statement shows them to be largely Libertarian to conservative. The authors' state that, “in examining our school system it becomes clear that not only are we not trying to bring all minority children to the top levels of performance, we continue to reinforce and extend disadvantages. …Minorities perform far worse on standardized tests, [and] they drop out in far higher numbers” (TPPF).

TPPF also raises concerns about the practice, which is being slowly phased out, of students being exempted from the test by the principal of the school. For example, TPPF reported that in San Antonio’s Northeast ISD exemptions for black children were four times more likely than for Asian children for the 1995 math test. TPPF does not attempt to explain why there is a disparity of outcome on the TAAS: 45.1% of Hispanics passing the Exit Level TAAS in 1996 and 39.3% of African-Americans passing in comparison to 74.9% of Whites passing and 73.8% of Asians passing. They do however note that even with the introduction of the Spanish-language version of the fourth-grade test, a large number (about two-thirds of the students) were still unable to get a score of 70, though they were being tested in their own language. TPPF feels that this indicates the problem is not just one of language but a lack of knowledge - reflecting an ineffective curriculum in the schools.

TPPF presents a laundry list of problems with TAAS testing – not just those associated with bias or disparity of outcome. In contrast, a draft paper by Linda McNeil of Rice University and Angela Valenzuela of the University of Texas at Austin attacks the issue head on, charging that:

the TAAS system of testing is reducing the quality and quantity of education offered to the children of Texas. Most damaging are the affects of the TAAS system of testing on poor and minority youth (McNeil).

This is happening because there are:

…a growing set of classroom practices in which test-prep activities are usurping a substantive curriculum. These practices are more widespread in those schools where administrator pay is tied to test scores and where test scores have been historically low. These are the schools that are typically attended by children who are poor and African American or Latino, many non-English-language dominant. These are the schools that have historically been under-resourced. In these schools, the pressure to raise test scores “by any means necessary” has frequently meant that a regular education has been supplanted by activities whose sole purpose is to raise test scores on this particular test [TAAS] (McNeil).

It is asserted by McNeil and Valenzuela that, “middle-class children in white, middle class schools are reading literature, learning a variety of forms or writing, and studying mathematics aimed at problem-solving and conceptual understanding…while poor and minority children are devoting class time to practice test materials whose purpose is to help children pass TAAS” (McNeil). Because of this, “The TAAS system of testing thus widens the gap between the public education provided for poor and minority children and that of children in traditionally higher-scoring (that is, Anglo and wealthier) schools” (McNeil).

In contrast to Haney, McNeil and Valenzuela state a need for studies that do not just look at the data of test scores and tested students. New attention needs to focus on closing the gab between those who are setting policy for schools and those who are actually teaching. Thus, they assert, there must be “independent research into the economics and political forces behind this system of testing and its promulgation” (McNeil). Moreover

Research into why these organizational, economic, and political forces are reshaping teaching and learning, and thereby restratifying children’s opportunities to learn would be far more productive than more studies on test question “validity” or race-based trends in test scores (McNeil).

Alternatives of TEKS and TAAS

Some schools are moving toward engaging students in real world tasks rather than just standardized tests. This type of assessment is often called authentic assessment of students. It can include oral presentations, performances, experiments, debates and inquiries, portfolios, oral examinations or any type of assessment that makes the student accountable for the information and learning by participating in real world experiences.

For instance, in Personal and Career Development at the International High School in New York City, the course includes programs to integrate academics with personal and career development including developing the values and habits of work that are in line with those of the marketplace. Students often work in groups, allowing them the experience alternative strategies and work habits. They are expected to show critical thinking skills and are also expected to evaluate their own work, discuss their work, and together, with teachers and peers, determine a final grade (Darling-Hammond, p131). The premise is that learning and assessment should be, they feel, an interactive relationship. Students should learn to work independently, as well as collaboratively, and that the collaborative work should be assessed by a) its product, b) the relationship between those involved, and c) the student's constructive critiques of each other.

Supervisors of students, like supervisors in the workplace, can then rate students on attendance, promptness, dependability, performance quality, ability to learn and several other factors. This assessment, with quality of work ratings from poor to excellent, mimics an annual performance evaluation in the workplace. Grading and expectations are set up much like a rubrics, where students are aware from the start what is expected of them and what quality of work standard is minimal, average or above average.

At P.S. 261, the school makes use of a tool called the Primary Language Record (PLR) as part of their assessment of students. The PLR, according to one teacher, "presupposes that learning takes place within a social context and that the responsibility for growth doesn't lie only with the teacher but is shared with children and parents" (Darling-Hammond, p187). Teachers talk to the parents about children's reading habits, their entertainment (watching TV, reading magazines, reading books) and about the children in general.. This type of assessment helps train the teacher to watch for development in the student rather than teaching "at" the student.

Art schools historically use portfolios as a form of evaluation for students. Many other assessment situations are now also making use of this same tool. For instance, in England, one school's graduation requirement includes portfolios as well as oral and written examination (Darling-Hammond, p11). This provides the student with varied opportunities to display their best work. It also allows examiners a chance to probe the quality of students' thinking and for students to recognize their own progress.

Return to other questions on Evaluating the Research