The dangers of teacher assessment

In most countries, externally marked tests have come to play an increasingly important role in education. In the UK nations, examinations determining pupils’ grades have for long been centrally marked. Until recently, however, many GCSE and A-level courses have included some teacher-assessed elements – and many have lamented the government’s determination to eliminate these components.

Educationalists often deplore standardised testing because it reduces teacher autonomy in the classroom and because it supposedly offers an unreliable snapshot of pupil performance. Yet any form of teacher assessment of importance gives rise to the possibility of score manipulation. Indeed, the lack of reliability in such assessment is the key argument against it.

In a recent paper, 'The Causes and Consequences of Test Score Manipulation: Evidence from the New York Regents Examinations', Thomas S. Dee, Will Dobbie, Brian A. Jacob, and Jonah Rockoff study the causes and consequences of allowing teachers to mark the Regents Examinations in New York State. These exams are high-stakes tests measuring performance in accordance with the state’s secondary-school curricula. Up until and including 2010, the tests were marked locally by teachers in the pupils’ own school. Then, in 2011, schools were not allowed to re-grade exams with scores just below the determined proficiency cut-off – a practice that previously was required – and in 2012, a reform centralised grading, which abolished any ability to manipulate test scores whatsoever.

The authors find clear evidence that teachers inflated scores in the period when it was possible to do so. There is a clear jump in the distribution of scores that fall just above the relevant thresholds for specific grades, compared with those that fall just below these thresholds – which would not be expected without manipulation.

Indeed, the authors find that about 40 per cent of scores that were close to the thresholds were inflated. This manipulation was reduced by 80 per cent in 2011 and then eliminated entirely in 2012. Both re-scoring policies and local grading were therefore important factors behind grade inflation.

Importantly, pre-reform manipulation affected pupils in different schools differently. Also, minority pupils with low initial performance and worse behaviour benefited from manipulation overall, simply because their scores were more likely to fall around the thresholds. However, conditional on scoring near the thresholds, these pupils were actually less likely to have their scores inflated.

Furthermore, using the reforms in 2011 and 2012, the authors analyse the causal impact of manipulation on future outcomes, finding that pupils who have their scores inflated above the relevant thresholds are much more likely to graduate from high school. This also has egalitarian implications: the black-white test score gap would have been 1.3 percentage point higher, while the share of pupils graduating from high schools would have decreased by 1.2 percentage points, without any manipulation taking place.

At the same time, however, pupils who have their scores inflated are less likely to meet the requirements for obtaining an advanced high school diploma and pass more advanced examinations. This indicates that the impact of grade inflation depends on pupils’ prior attainment. Pupils who are close to dropping out benefit from score inflation, as they are not forced to retake a class they may fail. But pupils on the margin of receiving the advanced diploma may be hurt by inflation if this means they are tricked into believing they can get away with studying less than they have to in order to succeed.

Finally, the authors also provide evidence of mechanisms driving the score manipulation. Inflation is just as prevalent in subjects that were not included in the accountability system, indicating that pressure from government is not a key driver. Intriguingly, the authors also find that teacher performance pay has no impact on score manipulation, indicating that pecuniary motives among teachers are unlikely to be the main reason either. Since they do find that inflation occurs both for pupils on the pass/fail margin and for those on the margin of secondary benefits, such as eligibility for university credits, it is likely that altruism ultimately drives manipulation.

Overall, the paper highlights the dangers of teacher assessment. It indicates that any such assessment of importance is likely to be subject to arbitrary inflation, whether or not the scores are tied to accountability systems, and that most forms of teacher assessment are likely to be compromised by the lack of comparability. In times of opposition to the government’s reforms to reduce or eliminate teacher-assessed components in GCSEs and A-levels, the paper provides fresh evidence indicating that such opposition may be misplaced.[1]

[1] This does not mean that all forms of teacher assessment is necessarily is bad. Rob Coe and I have previously suggested that such assessment could potentially be reconciled with demands for comparability. This would require the total distribution of marks to be fixed at the exam centre level, based on results in the external components of the qualification. This would ensure a fixed sum of teacher-assessed marks (e.g. 10 As, 20 Bs etc.), which means that teacher assessment would only redistribute marks among pupils

This comment piece is also the Editor's Pick in the CMRE Monthly Research Digest_05_16. The piece reviews a paper by Thomas S. Dee, Will Dobbie, Brian A. Jacob, and Jonah Rockoff'The Causes and Consequences of Test Score Manipulation: Evidence from the New York Regents Examinations', published as NBER Working Paper No. 22165, a free copy of which may be downloaded here.

You can download free copies of back issues of the CMRE Monthly Research Digest here.

