Machine Scoring of Student Writing

Machine Scoring of Student Writing – 2018

In accordance with the National Council of Teachers of English Position Statement on Machine Scoring (2013), OCTELA opposes the use of machine grading in state standardized testing situations and asks the Ohio Department of Education to refrain from the use of machine scoring on these high-stakes tests. OCTELA’s opposition is rooted in research findings, expressed in the NCTE position statement, that demonstrate the significant limitations and detrimental effects of machine scoring for students, schools, and the profession.

As established by the National Council of Teachers of English in their statement on machine scoring, writing is a highly complex ability developed over years of practice, across a wide range of tasks and contexts, and with copious, meaningful feedback. Students must have this kind of sustained experience to meet the demands of higher education, the needs of a 21st-century workforce, the challenges of civic participation, and the realization of full, meaningful lives.

Research on the assessment of student writing consistently shows that high-stakes writing tests alter the normal conditions of writing by denying students the opportunity to think, read, talk with others, address real audiences, develop ideas, and revise their emerging texts over time. Often, the results of such tests can affect the livelihoods of teachers, the fate of schools, or the educational opportunities for students. In such conditions, the narrowly conceived, artificial form of the tests begins to subvert attention to other purposes and varieties of writing development in the classroom. Eventually, the tests erode the foundations of excellence in writing instruction, resulting in students who are less prepared to meet the demands of their continued education and future occupations. Especially in the transition from high school to college, students are ill-served when their writing experience has been dictated by tests that ignore the ever-more complex and varied types and uses of writing found in higher education.

Again, as noted in the NCTE position statement, “These concerns—increasingly voiced by parents, teachers, school administrators, students, and members of the general public—are intensified by the use of machine-scoring systems to read and evaluate students’ writing. . . . The attraction is obvious: once programmed, machines might reduce the costs otherwise associated with the human labor of reading, interpreting, and evaluating the writing of our students. Yet when we consider what is lost because of machine scoring, the presumed savings turn into significant new costs — to students, to our educational institutions, and to society. . . .

● Computers are unable to recognize or judge those elements that we most associate with good writing (logic, clarity, accuracy, ideas relevant to a specific topic, innovative style, effective appeals to audience, different forms of organization, types of persuasion, quality of evidence, humor or irony, and effective uses of repetition, to name just a few). Using computers to “read” and evaluate students’ writing (1) denies students the chance to have anything but limited features recognized in their writing; and (2) compels teachers to ignore what is most important in writing instruction in order to teach what is least important.
● Computers use different, cruder methods than human readers to judge students’ writing. For example, some systems gauge the sophistication of vocabulary by measuring the average length of words and how often the words are used in a corpus of texts, or they gauge the development of ideas by counting the length and number of sentences per paragraph.
● Computers are programmed to score papers written to very specific prompts, reducing the incentive for teachers to develop innovative and creative occasions for writing, even for assessment.
● Computers get progressively worse at scoring as the length of the writing increases,
compelling test makers to design shorter writing tasks that don’t represent the range and variety of writing assignments needed to prepare students for the more complex writing they will encounter in college.
● Computer scoring favors the most objective, “surface” features of writing (grammar,
spelling, punctuation), but problems in these areas are often created by the testing
conditions and are the most easily rectified in normal writing conditions when there is
time to revise and edit. Privileging surface features disproportionately penalizes nonnative speakers of English who may be on a developmental path that machine scoring fails to recognize.
● Conclusions that computers can score as well as humans are the result of humans being trained to score like the computers (for example, being told not to make judgments on the accuracy of information).
● Computer scoring systems can be “gamed” because they are poor at working with human language, further weakening the validity of their assessments and separating students not on the basis of writing ability but on whether they know and can use machine-tricking strategies.
● Computer scoring discriminates against students who are less familiar with using
technology to write or complete tests. Further, machine scoring disadvantages school
districts that lack funds to provide technology tools for every student and skews
technology acquisition toward devices needed to meet testing requirements.
● Computer scoring removes the purpose from written communication — to create human interactions through a complex, socially consequential system of meaning making — and sends a message to students that writing is not worth their time because reading it is not worth the time of the people teaching and assessing them.”

The Ohio Department of Education’s choice to machine-grade writing on standardized tests has been made without providing educators vital information: Can we see examples of how the grading program scores? What specifically is it being programmed to look for, and why? Which company/algorithm is being used, and why? How were teachers involved in the vetting of this system? What protections are in place to ensure its accuracy? What is the financial gain or saving by having a computer algorithm, rather than a human grader, score the essays? Is that human element worth sacrificing for the sake of costs? How will these savings directly help students and teachers?

The Ohio Department of Education’s choice to machine-grade writing on standardized tests is based on the promise of more timely score reporting. In practice, machine scores are reported only one week earlier than human reader scores. Even then, schools receive scores after they are recessed for the summer, making the promise of faster reporting inconsequential in comparison to the loss of accurate assessment of learning that can only occur when writing is scored by human readers.

As presented, the Ohio Department of Education’s choice to machine-grade writing on
standardized tests fails to address the roles and responsibilities of testing companies. Neither the ODE nor contracted testing companies have made available a transparent terms of service statement that explains how student data is captured, used, stored, or managed across students’ academic careers. Of primary concern is that the algorithmic scoring systems are licensed, patented products that are owned by companies whose primary enterprise is to profit from the collection of student data and use that data to improve their own algorithms rather than improve educational resources, conditions, or outcomes.

When states opt-in to machine scoring, they are involuntarily giving over students’ personal educational data to companies without a clear sense of whether and how this data is kept, how students’ performance and data will be used in the future, and how the long-term tracking and surveillance of students across their educational careers might have negative impacts. When states or districts or programs opt in, parents and students give up their rights to privacy of student information, data, and performance.

OCTELA strives to best utilize end-of-course exams as a tool for bettering our students as
writers. While the expedition of the grading process is appreciated, we fear that what is gained in the process is simply not worth the loss. We suggest high-quality assessment practices that align with NCTE’s Standards for the Assessment of Reading and Writing:

● encourage students to become engaged in literacy learning, to reflect on their own
reading and writing in productive ways, and to set respective literacy goals;
● yield high-quality, useful information to inform teachers about curriculum, instruction, and the assessment process itself;
● balance the need to assess summatively (make final judgments about the quality of
student work) with the need to assess formatively (engage in ongoing, in-process
judgments about what students know and can do, and what to teach next);
● recognize the complexity of literacy in today’s society and reflect that richness through holistic, authentic, and varied writing instruction; and
● at their core, involve professionals who are experienced in teaching writing,
knowledgeable about students’ literacy development, and familiar with current research in literacy education.

A number of effective practices enact these research-based principles, including portfolio
assessment; teacher assessment teams; balanced assessment plans that involve more localized (classroom- and district-based) assessments designed and administered by classroom teachers; and “audit” teams of teachers, teacher educators, and writing specialists who visit districts to review samples of student work and the curriculum that has yielded them. This investment would also support teacher professional development and enhance the quality of instruction in classrooms—something that machine-scored writing prompts cannot offer.

OCTELA believes that students deserve effective teachers. However, placing such high-stakes decisions on standardized assessment data from a single annual test that does not include a human scorer fails to do what is right for our students and will not provide an accurate report of how students are meeting and/or mastering state standards. Machines will miss nuances and stylistic choices in student writing which may be the direct result of a teacher’s instruction. Failure to account for these important aspects of writing does not benefit the student or the teacher.

Writing, regardless of genre, is an art form. Student writing should be reviewed and evaluated by those who understand and appreciate this notion. The stakes are too high for all parties, students and teachers alike, to sacrifice the tried and true element of human scoring for the ease and efficiency of machines: students are in jeopardy of being left behind based on test performance, teachers are at risk of losing jobs, and districts are exposed to losing funding.

Until the Ohio Department of Education is forthcoming with how machines could possibly be programmed to cover the nuanced needs of diverse writers, OCTELA stands in opposition to their use in state standardized testing situations.