- Research blog
‘OECD PISA: Is it fit for purpose? (Do we even know what it’s for?)’
The Programme for International Student Assessment (PISA) is a triennial international survey, which aims to evaluate education systems worldwide by testing 15-year-old students’ problem-solving ability in reading, maths, and science. To date, students representing more than 70 economies have participated in the assessment.
Features of the assessment method and aspects of the OECD’s policy have led CMRE researchers to question whether the importance attached to the PISA findings is warranted and whether its influential narrative is entirely helpful to countries’ reform efforts.
Criticisms have been levelled against PISA that have highlighted various methodological issues. Questions remain about small sample sizes and non-random allocation to test groups, and the margins of error that may then be introduced when extrapolating to the country level. These features affect whether the survey is capable of yielding the level of confidence necessary to justify its claim to be a reliable ‘ranking’. Nevertheless, most concur that the basic data published in these reports provide useful starting points for further analysis.
Further difficulties emerge, however, when the OECD’s PISA analysts get to work on the data. There are significant challenges involved in moving from data analysis to developing appropriate policy responses – a process which involves taking account of different countries’ social, economic, educational and political histories, alongside of analysis of current challenges and future priorities. To some extent, the OECD acknowledges these issues, and yet the PISA report overlooks them. Through looking at a series of correlations, and attempting to find best practices, the volume explores the inter-relationships between a range of different variables and educational achievement, with a view to comparing different countries’ performance in the tests. It then goes on to make recommendations on education reform on this basis.
On the afternoon of 20th April members of CMRE’s executives’ policy forum and a number of guests from the world of education gathered to discuss these issues and whether we can believe what the PISA survey purports to be able to tell about the quality of countries’ education systems.
The roundtable began with a welcome from CMRE’s Executive Director, James Croft, and a brief introduction from the Chair, Tim Oates, CBE, on behalf of sponsors’ Cambridge Assessment, with whom CMRE has been doing work in this area. An informal discussion followed, under Chatham House rules, which this document attempts to summarise.
Tim began by stating that Cambridge Assessment’s interest in the area was above all to see a refinement and nuancing of the transnational message, such that naïve policy borrowing would be discouraged. It’s possible to find support for a range of policies from the correlations and attempts to identify best practices, he said, if that’s what you are intent on doing, so it’s important that survey users be given the right interpretative tools to understand the big level data and analysis that’s presented to them, and what they need to do in the way of further analysis.
Unfortunately it appears that the OECD has little interest either in giving understanding of its methodology and what it can yield, or in exploring the correlations that it presents in context. On the contrary, Tim said, anecdotal testimony suggests the data is used more often to manufacture anxiety than to illuminate educational standards. This makes efforts such as Gabriel Heller Sahlgren’s, to understand Finland’s apparent success in historical, economic, and cultural context, crucial to successful policymaking.
Tim then introduced Juliet Sizmur, Research Manager at the National Foundation for Educational Research (NFER)’s Centre for International Comparisons – the organisation responsible for the delivery of PISA’s 2015 survey in Scotland.
In the light of Tim’s remarks, Juliet posed the question ‘What then can we believe about international surveys?’ With reference to remarks made upon the publication of the last set of results in 2013, Juliet highlighted the dubious nature of some of the claims made by press and politicians alike, sometimes resting on misunderstandings as basic as, for example, how countries entering and withdrawing between one round of tests and the next affect the rankings, or when a difference in score is statistically significant or not. For all the media hysteria in 2013 around England’s performance, for example, it may not have been immediately obvious to the public that, in Maths, it was no different to the OECD average. Probably the greatest risk in the use of large-scale international datasets, she said, is the ease with which it is possible to draw overly simplistic – or erroneous – conclusions.
Following a detailed review of PISA assessment processes, Juliet argued that the problem was not a lack of rigour. There is a long development cycle, which draws on a wide range of international expertise, including experts in the fields of translation and sampling, in addition to specialised assessment organisations. The assessment frameworks are developed by subject expert groups and revised and updated at each round of the survey so that assessments develop with changes in education.
In answer to more basic concerns raised by Kreiner (2011) about whether the Rasch model is appropriate for the basis of PISA ranking, given differential item functioning (DIF) (i.e. questions having different degrees of difficulty in different countries, thus creating significant margins of error), Juliet referenced PISA’s official response and drew attention to the fact that PISA 2015 has moved away from a pure Rasch scaling model in PISA 2015. It now uses a combination of Rasch and ‘two-parameter’, which aims to take more account of differential item functioning across counties and sub-groups. (See here for fuller technical explanation.) She also highlighted that at field test stage, it was always the case that many items that do not function technically well in terms of DIF are excluded from the main survey.
In addition, she drew attention to the fact that the OECD now publishes a range of ranks (although the press and public – and many policymakers – tend to disregard this). Their own analysis is now based more on relationships to the OECD average and focused on score trends over time, along with relationships between variables, than on rankings. (See PISA 2012 Results: What Students Know and Can Do: Student Performance in Mathematics, Reading and Science, Volume I, p. 277 for technical explanation.) The emphasis on trends over time, she said, may have been lost in England, because of the PISA 2003 problems. Trends in England will be fully reported for the first time (with comparison to science in PISA 2006) in the forthcoming report on PISA 2015.
OECD PISA’s engagement with its critics in these respects is reassuring, she said, as the organisation has often been criticised for its lack of responsiveness. (For the record, official OECD responses to criticisms of its methodology may be found here.)
Nevertheless, Juliet concurred with Tim Oates that we do need to be much more careful about how we use the survey results for policy development. ‘The survey results show only correlations – the real reasons are very likely to be more complex and not transplant well into different soils.’ The OECD purports to be offering ‘comparative international assessments’ which ‘can extend and enrich the national picture by providing a larger context within which to interpret national performance’ – ‘formative assessment of education systems’ if you like – but there is a danger that the survey is becoming ‘a high stakes summative assessment of accountability arrangements’.
In the course of discussion, it was noted that the high stakes nature of the exercise has at times made it difficult to secure the proper participation of teaching practitioners and pupils in England, though the reverse was true in Hong Kong. There was agreement that further research is required to understand the culture/values of different country contexts and how these affect performance in the tests.
Questions were asked about the sampling and how representative it can claim to be, particularly in relation to discussion of the performance of particular profiles of pupil, such as those attending private schools, SEN pupils in special schools, and schooling that happens outside the state system in unrecognised schools.
It was recognised that there is a lack of attention to countries’ immigration policies, which may have a significant distorting effect on results, and rankings.
After further discussion of the sampling method, however, there was an emerging consensus that, while not without its challenges, this was sound overall, and probably as good as you are going to get with this type of exercise. The important thing is that these issues are acknowledged, documented and taken into account when analysing country results. It was felt that much greater transparency was required however around the scaling method, if the confidence of the wider academic and assessment communities was to be maintained.
In relation to PISA’s stated purpose to assess the extent to which students can apply their knowledge to real-life situations, and may therefore be said to have been effectively prepared for life and work as an adult, it was acknowledged that the evidence of such a link was limited and that strength of the link was disputed.
The strength of the correlation with TIMMs (another international survey focused on Maths and Science) was highlighted, indicating that they must therefore assess largely similar domains. Some time was spent discussing what additional value PISA brings, in particular to improving educational outcomes.
In response, a number of examples were offered which highlighted the survey’s usefulness as a signalling device and catalyst for change. The cases of Germany and Poland were offered as examples. It was also argued that PISA had been able to deliver capacity for change which had previously been lacking in many countries. The data collection has opened up understanding of regional and other disparities notably in Spain and Brazil.
In the political context, said another participant, the ability to get granular with the data is extremely useful for challenging some of the inflated claims that countries make about their performance. It was agreed that there was ‘a lot of system value in preventing people getting carried away with their own narrative’.
Nevertheless, scepticism was expressed by another participant that there was very much of this kind of challenge going on. Rather than encouraging detailed analysis of the data, PISA’s overall approach promotes the countries making unjustified claims about the reforms they believe are responsible for improvements. For example, the claims made about Poland and Germany’s reform successes can’t be substantiated with proper research.
So, were the high-level correlations and the grand narrative really essential to the purpose of PISA? Rather than prescribing, an alternative approach might be to monitor the ways in which other people are analysing and describing its data. At the moment, the OECD hires experts to do the analysis. The problem may be that they are all ‘in-house’, employed by the OECD and briefed to look for correlations.
Returning to the point about whether the policy recommendations are adequately supported, another contributor highlighted that what’s missing is all of the policy information, the history of its development, etc. In PISA you have very detailed information on pupil performance, detailed information on student background, quite detailed information about the school set-up and some perceptions of teaching in the school, but not this crucial information about policy and context.
With the roundtable nearing its close, CMRE’s Executive Director, James Croft, asked if he might bring things to a point. Where do we go from here? What would we have the OECD focus on? Would it be 1) addressing underlying methodological issues? Would it 2) be the scaling model? Or would it be 3) how they make their policy recommendations, and on what basis?
Participants’ responses focused mainly on 3).
There was general agreement that the basis for the kind of policy recommendations OECD wish to make is not sound. This is not to say that the tests and the data the survey supplied are not useful – they are – but that more sophisticated analysis, including analysis of policy developments over time, is needed. Such was the only effective means of challenging the tendency to over-claim in the presentation of the survey’s findings.
Another participant felt that, in that the information on pupil performance was the most valuable aspect of the exercise from an educator’s point of view, more detailed information about the school context and themes and processes at the classroom level was also surely desirable.
The Chair then drew the discussion to a close. In conclusion he said that he thought PISA need to do far more to get traction globally, to do more to support domestic interpretation, and that this was achievable on the organisation’s present resources. James thanked him for chairing, Cambridge Assessment for its sponsorship of the gathering, and participants for their comments and guidance.
There are clear messages that policymakers should take from this.
In terms of their value for education reform, the level of importance attached to OECD PISA’s ranking, analysis, and policy recommendations is unwarranted.
The overall policy narrative is not always helpful to countries’ reform efforts. Checking the validity of OECD PISA’s claims has become a time-consuming but necessary process.
At a basic level, inadequate evidence has been supplied to support the validity of the tests as a measure of the extent to which students have been effectively prepared for life and work as an adult.
The limits of estimation are not well-understood. The nature of the exercise – in respect of the scaling the sample results – is such that its claim to be a straightforwardly reliable country ranking should be treated with caution.
The basic data published in these reports are generally regarded as useful starting points for further analysis. We can feel confident of this because the PISA assessment processes are fundamentally sound.
Nevertheless, the official OECD analysis of the survey is seriously flawed. The survey results show only correlations and are insufficiently informed as regards country context and policy development over time. The analysis overreaches the conclusions that may be drawn from looking at correlations alone and encourages simplistic policy-borrowing.
If the OECD cannot gather more information about policy context and development over time and bring a wider evidence-informed perspective to bear on their analysis, it should leave the analysis to others.
Rather than prescribing, this alternative approach would entail OECD monitoring the ways in which other people are analysing and describing its data.
Academics internationally should be encouraged to engage and debate the findings to improve the critical perspective of policymakers.
To improve public and professional understanding, governments should articulate how they understand and interpret international surveys and the importance they attach to them in policymaking, and why. The rationale for participation in a given survey should be stated ahead of the OECD’s publication of the results.