International Large-Scale Assessments in Education – Do They Contribute to Better Results?

International Large-Scale Assessments in Education – Do They Contribute to Better Results?

By Roar Grøttvik, Political adviser, Union of Education Norway


The education systems in most OECD countries have now lived with the consequences of International Large-Scale Assessments (ILSAs) for more than a quarter of a century. Although these international assessments (TIMSS, PISA, PIAAC, PIRLS, etc.) have been debated extensively during this period, it is high time that we welcome a more general debate about ILSAs and their impact on education. There are many relevant issues to explore. The most important question is of course whether ILSAs have contributed to better outcomes in education. But there are many other relevant questions to be asked. Have the political motivations for governments to invest so many resources in ILSAs been fulfilled? Do ILSAs produce data of high quality and with high relevance to solve the real challenges in the education systems of participating countries? To what extent have ILSA results been exaggerated, misinterpreted or misused by governments? And what should be the responsibilities of researchers involved in the work when this happens? What have been detrimental side effects of ILSAs to education and to students? To what extent have ILSAs contributed to the improvement of psychometrics as a field of research and to teachers’ understanding of psychometrics? And how has the relationship between governments and the research community been influenced by the emergence of ILSAs?

In a short article like this, it is only possible to scratch at the surface of these questions. But every one of them deserves to be discussed at length. My intention here is just to present my own view from a teacher union perspective. I have been working nationally and internationally through the Trade Union Advisory Committee to the OECD to address the questions above for more than 20 years as a senior union official. My hope is that this short text can work as an introduction to a fruitful debate between the teaching profession and the ILSA research community.

I am sure there are many different motivations and psychological mechanisms involved in explaining why many governments have decided to take part in ILSAs. There is little doubt that part of the explanation is linked to the emergence of neo-liberal ideological beliefs in competition as the main driver for any improvement in society and in education. In the public sector, New Public Management is built on the same ideology and comparable data on educational outputs thus becomes a necessary agent. The fact that ILSAs are a responsibility of international agencies “owned” by governments, also puts a certain pressure on member countries to participate and is linked to national prestige. The history of the emergence of PISA is interesting in this respect, since it was the most influential member and donor, the United States, that advocated most strongly for the OECD to develop comparable tests in education. The development of the IEA is also interesting in this respect, since this organization was originally dominated and run by the international research community. It therefore seems that the first ILSAs conducted by IEA were an effort driven much more by research interests than by political interests. Later, the responsibility to finance IEA was taken over by governments in several countries which gave governments more direct and indirect influence on IEA work.

My intention is not to assert that governments have not had honest and good intentions and motives to invest in ILSAs to improve educational outcomes, but rather that these intentions and motives have been linked to certain ideological beliefs and general political views. What is rather strange however, is that the political debate about whether ILSAs have contributed to the improvement of education and educational outcomes after all these rounds, seems to be almost non-existent. The long-term trends of international average results are that we see very little improvement, or rather the opposite as with Mathematics results in PISA. When we try to discuss with the Norwegian educational authorities whether all the efforts and money used for international ILSAs are worth it, the regular answer is that every piece of information and knowledge can help underpin better educational policies. However, I often think that the amount of political “cherry picking” of research and ILSA-results has increased and given the expression “evidence-based policies” a rather dubious reputation.

There are many examples of governments’ exaggeration and misinterpretation of ILSA results. The under-communication of statistical and psychometric uncertainties has maybe been the most damaging aspect, especially when the same results are presented in the press. Of course, governments cannot take the whole responsibility for how ILSA results are presented and interpreted by the press. And we all know how the press needs to simplify when presenting complex realities. However, the OECD, governments, and sometimes even the involved researchers, must take some responsibility for sensational newspaper headlines. For those of us who follow the OECD work around ILSAs closely, the discrepancy between how the OECD press department’s portrayal of test results differ from the more academic and cautious wording used in the official reports, is obvious. What I think is even more serious is when researchers involved in and responsible for national reports stay passive when misinterpretations occur among politicians or in the press, as I think is the case in my own country.

So, ILSA results have gone up and down in different countries. In some, we have witnessed rather constant improvements, both in ranks and number of points, while in others we have seen declines from one test to the next. In Finland, which was the PISA star of the western world in the first PISA rounds, the drop in the last few PISA tests has been rather dramatic. Does this mean that Finnish education is in a worse shape now than in the early 2000s, and that Finnish children learn less in school? The simple answer is, we do not know because the PISA test items only covers a rather narrow band of knowledge and competence constructs. And despite methodological improvements of the test batteries and the surveys, we know that there are still a lot of concerns linked to cultural bias, translation, sampling of participants, and many others.

Maybe the most problematic aspect with ILSAs is that they give the public the impression that all education outcomes, or all the important ones, are measurable, and that these outcomes can be isolated from the learning that takes place in all other life arenas. In many education systems this notion has had a very damaging effect. It has led to a narrowing of the curriculum, a teaching-to-the-test pedagogy, and put much higher performance pressure on kids. ILSA results have also spiked the development of more national and local tests in many countries. This datafication of education through production of huge amounts of test and survey data has also paved the way for the uncritical introduction of digital tools and private economic interests in the education sector, blurring the borders between government, research and multi-national economic interests. The complicated psychometric and statistical processes that take place when a rather broad competence construct is developed into test items and the results are represented through levels on a standardized scale, have also probably alienated the teaching profession from the whole measurement process. And so, teachers’ assessment practices in their daily work have become totally disconnected from practices through ILSAs.

My main concern, therefore, is that we, the researchers and the teaching profession, need to re-connect. Many or most of the ILSAs in education have probably come to stay, at least for a while. It should be our main goal that the data they produce are not over-stated or over-interpreted. We should work to ensure that all the different uncertainties are clearly explained. And we need to stand together to make clear that more attention, efforts and resources need to be directed towards how we can improve the weaknesses ILSAs have uncovered. My hope is that more educational research is directed towards the challenges of the classroom rather than the challenges of education policies and politics.


Roar Grøttvik is a political adviser with the Union of Education Norway. He trained to be a teacher at the University of Bergen and has served in many positions in the union since the early 1980s. He has been a member of the Trade Union Advisory Committee to the OECD working group on education and training since the late 1990s. And he is chairing the board of the Education International Research Institute.


To reference this blog:  Grøttvik, Roar. 2019. International Large-Scale Assessments in Education – Do They Contribute to Better Results? Laboratory of International Assessment Studies blog series. Published on 11th June 2019. Accessed at 

Lab Team
Written by Lab Team