How did ILSAs become the legitimate international education benchmark?

Dr Camilla Addey, Teachers College, Columbia University

The development of International Large-Scale Assessments (ILSAs) over the last ten years illustrates a growing culture of trust in internationally comparative performance data – going from philosophical doubt and reluctance to compare learning outcomes as late as in the 1980s to unconditional trust in the validity and objectivity of psychometric testing, universal metrics, and policy solutions and best practices drawn from comparative performance data. The latest ILSA developments appear to be taking comparisons a step further, although validity concerns of performance comparisons are still legitimate questions.

Those historical themes of ILSA development are discussed in my recently published chapter in ‘Assessment Cultures: Historical Perspectives’edited by collection by Alarcón and Lawn (Peter Lang Publisher, 2018).  My chapter, on ‘The assessment culture of international organizations: from philosophical doubt to statistical certainty through the appearance and growth of international large-scale assessments’, describes the shifting assessment culture of international organizations understood as key to the emergence of the ILSA phenomenon. Tracing the emergence of ILSAs, the chapter seeks to understand how widespread philosophical debates questioning the validity of testing and comparing learning outcomes across countries was sidelined in favour of universal learning metrics.

Drawing on empirical research in Martens (2007), Pizmony-Levy (2013, 2014), Addey (2014) and Heyneman (2012), I describe how the Organization for Economic Cooperation and Development (OECD), the International Association for the Evaluation of Educational Achievement (IEA), and the UNESCO Institute for Statistics (UIS) dealt with conceptual and methodological challenges and policy pressure from the United States to measure and compare educational progress. The OECD and IEA turned these pressures into an opportunity to increase their influence, thus changing their culture of assessment but also approaches to assessment around the world.

I focus on the period between the late 1950s and 2016, and in particular on the 1990s – a key decade in this assessment culture shift. Before the 1990s, a wider range of subjects were tested, fewer countries participated in a smaller number of ILSAs, and most ILSAs were research-oriented (Pizmony-Levy 2013, 2014). Most importantly, macro education indicators were compared but educational performance comparisons were treated with skepticism, even warned against by organizations like IEA (Kamens 2013, Pizmony-Levy 2013). Indeed, Martens’ OECD interviewees reported that the OECD ‘purposefully avoided anything which amounted to encouraging countries to compare themselves’ (2007: 48).

The IEA, now known for administrating TIMSS and PIRLS, was the only institution testing learning across countries between 1958 and the late 1980s. Pizmony-Levy’s (2013 and 2014) research shows that IEA was initially driven by intellectual curiosity and justified its international testing activities as international-scale education research. Pizmony-Levy (2013) shows how IEA went from being ‘a field organized/oriented toward educational research to a field organized/oriented toward educational audit and accountability’ (2013: 2). He argues this change is twofold: an ownership shift from research to government affiliates, and a purpose shift from research to policy. Pizmony-Levy’s research, similarly to Martens’ (2007) empirical work on the OECD, also highlights how the US influenced the development of international comparative education indicators through financial support.

The OECD, which administrates PISA, the most widely known and influential ILSA, also saw a sharp turn in its approach to measuring and comparing learning outcomes across countries. Martens’ (2007) research shows that until the early 1990s,the OECD did not welcome comparisons of educational systems which it viewed as unique to the socio-historical contexts within which they had developed. Henry et al. state that in the 1990s, the OECD ‘saw some remarkable shifts in the development of education indicators within the OECD: from philosophical doubt to statistical certainty; from covering some countries to covering most of the world; from a focus on inputs to a focus on outputs’ (2001: 90). The origin of this shift is to be sought in the US educational crisis that followed the publication of Nation at a Riskin 1983 and the comparatively bad results of the US in IEA’s Second International Mathematics Study (SIMS) between 1980 and 1982. Comparative educational indicators became a priority for the US and through its National Center for Education Statistics (NCES), it ensured important changes in international comparative data take place, by acting directly on the organizations developing and administrating international tests. The US pushed the OECD to gather and disseminate internationally comparative data on education and when its suggestions were viewed with skepticism, threats to withdraw funding were made in 1987.

Martens’ OECD interviewees suggest the U.S. were keen to export its educational debate ‘to avoid considering that the crisis of education was only an American issue’ (2007: 45). Demanding internationally comparable data for different reasons, Martens’ interviewees state that ‘the OECD had to modify its programme of work on education indicators’ (2007: 46). Discussions were initiated, a set of indicators were agreed on, and the International Indicators of Education Systems (INES) was established at the OECD in 1988 with an increasing focus on educational outputs and performance measurement.

At the same time, UNESCO’s educational statistical capacity to validly monitor educational progress was challenged by the US, which created a legitimate voice for itself by acting through the legitimacy of the BICSE Heyneman (2012). UNESCO statistics were criticized for being narrow, unreliable and inaccessible (Puryear 1995). In the early 1990s, UNESCO found itself without statistical legitimacy, whilst the IEA and the OECD were being pressured into developing comparative performance data. Seeking to claim a voice in ILSAs in 2003, the UIS sought to adapt OECD ILSAs to lower and middle income contexts by developing the Literacy Assessment and Monitoring Programme (LAMP). As a UIS interviewee recounted, ‘Nothing was very clear apart from the need to get into the assessment field’ (Addey 2014).The UIS programme experienced poor political and financial support, suffered many staff changes, and faced many methodological and conceptual challenges.

Although LAMP’s aims were timely, it did not gain the kind of global prestige of the IEA and OECD ILSAs. What LAMP did do however, was reveal the practical and epistemological difficulties ILSAs face as they grapple with diversity, standardization, comparative learning outcomes and rankings. Those practices sit uneasily with UNESCO’s respect for cultural diversity.

The history of ILSAs suggest that epistemological questions concerning performance comparisons have not yet been resolved, and remind us that the production and use of comparisons and rankings are inherently political acts.


Dr Camilla Addey is a Lecturer in Comparative & International Education at Teachers College, Columbia University. Camilla is also a Director of the Laboratory of International Assessment Studies. Previously, she worked at Humboldt University in Berlin and UNESCO in Paris. She researches international large-scale assessments and global educational policy. For more publications, see




Addey, C. (2014). Why do countries join international literacy assessments? An Actor-Network Theory analysis with cases studies from Lao PDR and Mongolia. School of Education and Lifelong Learning. Norwich, University of East Anglia. Ph.D thesis.

Heyneman, S. (2012). The Struggle to Improve Education Statistics in UNESCO: 1980-2000. World Bank. Washington DC.

Kamens, D. H. (2013). Globalization and the Emergence of an Audit Culture: PISA and the Search for “Best Practices” and Magic Bullets. PISA, Power, and Policy the emergence of global educational governance H. D. Meyer and A. Benavot. Wallingford/GB, Symposium Books

Martens, K. (2007). How to become an influential actor – the ‘comparative turn’ in OECD education policy. New Arenas in Education GovernanceK. Martens, A. Rusconi and K. Leuze. New York, Palgrave Macmillan.

Pizmony-Levy, O. (2013). Testing for all: the emergence and development of international assessments of student achievement, 1958-2012. Bloomington, IN, Indiana University. Ph.D thesis.

Pizmony-Levy, O., J. Harvey, et al. (2014). “On the merits of, and myths about, international assessments.” Quality Assurance in Education22(4): 319 – 338.

Puryear, J. (1995). “International education statistics and research: Status and problems.” International Journal Educational Development15(1).