Het effect van vraagtype in een internationale landenvergelijking

Er vindt geregeld grootschalig internationaal vergelijkend onderzoek plaats. Denk aan PISA, TIMMS of PIRLS. Beleidsmakers gebruiken de resultaten uit deze onderzoeken om beslissingen te nemen over het onderwijs. Hoewel de vergelijkingen waardevol zijn, is er ook kritiek. Zo bestaan de toetsen bijvoorbeeld uit meerkeuzevragen, terwijl dit vraagtype in sommige landen weinig gangbaar is. Wat doet dat met de kwaliteit van de vergelijking? Kun je wel een eerlijke vergelijking maken als de toetsen in het ene land voor leerlingen toegankelijker zijn dan in het andere land? In dit meerjarige onderzoeksproject belichten we deze vraag vanuit verschillende perspectieven.

Making sense out of DIF

International educational surveys aim to evaluate and compare educational achievement across countries. For doing that, they design dozens of items, which are administered to thousands of students. Differential item functioning (DIF) analysis has been usually performed as a practical task to find items that, given the same level of individual proficiency, may present extra difficulty to students belonging to a particular group.
This project has been focused on promoting the idea of DIF as one of the most interesting outcomes in education, as it signals substantive differences and similarities among countries in terms of curriculum, educational practices, learning processes, etc. A variety of statistical and psychometric techniques have been investigated to find the ones that can be easily applied in practice and may contribute to investigate measurement non-invariance when there are many groups (e.g. countries).


  • Researchers: Edwin Cuellar, Ivailo Partchev
  • Collaboration partners: University of Amsterdam, European Training Network OCCAM, Timo Bechger, Gunter Maris, Maarten Marsman
  • Target group: Educational practitioners and policy-makers
  • Duration: August 2018 – July 2021
"There is more to learn from the item-level performance than from a perfectly polished league table."
Edwin Cuellar
PhD Student

Substantive differences across countries

DIF has been usually interpreted as a nuisance and a threat to validity and comparability in educational assessments, while DIF has been found to be ubiquitous in international large-scale assessments. Items are usually removed, changed, or differentially modeled to account for those differences. However, DIF does not occur as an isolated phenomenon of individual items but it is showing substantive differences across countries. The project promotes that idea that there is more to learn from the item-level performance than from a perfectly polished league table.

Published papers

Cuellar, E., Partchev, I., Zwitser, R., & Bechger, T. (Accepted for publication). Making sense out of measurement non-invariance: how to explore differences among educational systems in international large-scale assessments. Educational Assessment, Evaluation and Accountability