logo logo International Journal of Educational Methodology

IJEM is a leading, peer-reviewed, open access, research journal that provides an online forum for studies in education, by and for scholars and practitioners, worldwide.

Subscribe to

Receive Email Alerts

for special events, calls for papers, and professional development opportunities.

Subscribe

Publisher (HQ)

RHAPSODE
Eurasian Society of Educational Research
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, HA4 7AE, UK
RHAPSODE
Headquarters
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, HA4 7AE, UK

'item response theory' Search Results



...

“Mathematical knowledge for teaching” is a concept indicating the requirement for a specific kind of knowledge required to teach mathematics. Mathematical knowledge for teaching necessitates a more complex structure than what is required to carry out mathematical tasks and the knowledge to do that. The purpose of this study is to realize the adaptation of “Mathematical Knowledge for Teaching -Geometry (MKT-G)” Test that was initially conceived in English to Turkish (or to Turkish culture). During the adaptation process; after the translations of the items, focus group interviews were held with a group consisting of mathematics teacher educators and experienced mathematics teachers, and then the data from 243 elementary mathematics teachers was analyzed via  Item Response Theory (IRT). As a result of the analysis of the test items, psychometric values of the test items indicated that the items in the test performed well in Turkey. Besides, validity and reliability arguments were also tested. As a result, the Turkish version of the MKT-G test is highly reliable and valid to measure the teachers’ knowledge of teaching geometry.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.5.4.547
Pages: 547-565
cloud_download 720
visibility 1850
4
Article Metrics
Views
720
Download
1850
Citations
Crossref
4

Scopus
4

...

Pearson product–moment correlation coefficient between item g and test score X, known as item–test or item–total correlation (Rit), and item–rest correlation (Rir) are two of the most used classical estimators for item discrimination power (IDP). Both Rit and Rir underestimate IDP caused by the mismatch of the scales of the item and the score. Underestimation of IDP may be drastic when the difficulty level of the item is extreme. Based on a simulation, in a binary dataset, a good alternative for Rit and Rir could be the Somers’ D: it reaches the ultimate values +1 and –1, it underestimates IDP remarkably less than Rit and Rir, and, being a robust statistic, it is more stable against the changes in the data structure. Somers’ D has, however, one major disadvantage in a polytomous case: it tends to underestimate the magnitude of the association of item and score more than Rit does when the item scale has four categories or more.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.6.1.207
Pages: 207‒221
cloud_download 1370
visibility 3395
20
Article Metrics
Views
1370
Download
3395
Citations
Crossref
20

Scopus
24

...

Kelley’s Discrimination Index (DI) is a simple and robust, classical non-parametric short-cut to estimate the item discrimination power (IDP) in the practical educational settings. Unlike item–total correlation, DI can reach the ultimate values of +1 and ‒1, and it is stable against the outliers. Because of the computational easiness, DI is specifically suitable for the rough estimation where the sophisticated tools for item analysis such as IRT modelling are not available as is usual, for example, in the classroom testing. Unlike most of the other traditional indices for IDP, DI uses only the extreme cases of the ordered dataset in the estimation. One deficiency of DI is that it suits only for dichotomous datasets. This article generalizes DI to allow polytomous dataset and flexible cut-offs for selecting the extreme cases. A new algorithm based on the concept of the characteristic vector of the item is introduced to compute the generalized DI (GDI). A new visual method for item analysis, the cut-off curve, is introduced based on the procedure called exhaustive splitting.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.6.2.237
Pages: 237 - 258
cloud_download 1109
visibility 2503
8
Article Metrics
Views
1109
Download
2503
Citations
Crossref
8

Scopus
10

...

A new index of item discrimination power (IDP), dimension-corrected Somers’ D (D2) is proposed. Somers’ D is one of the superior alternatives for item–total- (Rit) and item–rest correlation (Rir) in reflecting the real IDP with items with scales 0/1 and 0/1/2, that is, up to three categories. D also reaches the extreme value +1 and ‒1 correctly while Rit and Rir cannot reach the ultimate values in the real-life testing settings. However, when the item has four categories or more, Somers’ D underestimates IDP more than Pearson correlation. A simple correction to Somers’ D in the polytomous case seems to lead to be effective in item analysis settings.  In the simulation with real-life items, D2 showed very few cases of obvious underestimation and practically no cases of obvious overestimation. With certain restrictions discussed in the article, D2 seems to be a good alternative for these classic estimators not only with dichotomous items but also with the polytomous ones. In general, the magnitudes of the estimates by D2 are higher than those by Rit, Rir, and polychoric correlation and they seem to be close of those of bi- and polyserial correlation coefficients without out-of-range values.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.6.2.297
Pages: 297-317
cloud_download 579
visibility 1683
11
Article Metrics
Views
579
Download
1683
Citations
Crossref
11

Scopus
12

...

Although Goodman–Kruskal gamma (G) is used relatively rarely it has promising potential as a coefficient of association in educational settings.  Characteristics of G are studied in three sub-studies related to educational measurement settings. G appears to be unexpectedly appealing as an estimator of association between an item and a score because it strictly indicates the probability to get a correct answer in the test item given the score, and it accurately produces perfect latent association irrespective of distributions, degrees of freedom, number of tied pairs and tied values in the variables, or the difficulty levels in the items. However, it underestimates the association in an obvious manner when the number of categories in the item is more than four. Towards this, a dimension-corrected G (G2) is proposed and its characteristics are studied. Both G and G2 appear to be promising alternatives in measurement modelling settings, G with binary items and G2 with binary, polytomous and mixed datasets.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.7.1.95
Pages: 95-118
cloud_download 1033
visibility 1854
11
Article Metrics
Views
1033
Download
1854
Citations
Crossref
11

Scopus
14

...

This article introduces the concept of the carrying capacity of data (CCD), defined as an integrated, evaluative judgment of the credibility of specific data-based inferences, informed by quantitative and qualitative analyses, leavened by experience. The sequential process of evaluating the CCD is represented schematically by a framework that can guide data analysis and statistical inference, as well as pedagogy. Aspects of each phase are illustrated with examples. A key initial activity in empirical work is data scrutiny, comprising consideration of data provenance and characteristics, as well as data limitations in light of the context and purpose of the study.  Relevant auxiliary information can contribute to evaluating the CCD, as can sensitivity analyses conducted at the modeling stage. It is argued that early courses in statistical methods, and the textbooks they rely on, typically give little emphasis to, or omit entirely, discussion of the importance of data scrutiny in scientific research. This inattention and lack of guided, practical experience leaves students unprepared for the real world of empirical studies. Instructors should both cultivate in their students a true respect for data and engage them in authentic empirical research involving real data, rather than the context-free data to which they are usually exposed.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.7.3.447
Pages: 447-463
cloud_download 511
visibility 1333
2
Article Metrics
Views
511
Download
1333
Citations
Crossref
2

Scopus
1

...

Science literacy, which is included in Programme for International Student Assessment (PISA) as an assessment area, is an important research and discussion area of science education literature with all its dimensions. In this study, the clustering results of the students from 34 Organization for Economic Cooperation and Development (OECD) countries participating in the PISA 2015 test and sampled by systematic sampling method are obtained by K-Means Clustering and Two-Step Cluster Analysis using the factor scores and PISA science literacy average scores. It is thought that the study is of great importance in terms of dividing individuals into clusters according to science instruction methods and the mean of plausible values and having an idea about how each cluster is defined. As a result of the K-means cluster analysis, it was determined that the input variable with the highest level of importance in the formation of the first and third clusters in which the students with the highest scores were included was teacher-directed science instruction, and after this variable, the input variable with the highest level of importance was the perceived feedback from science teachers. Within the scope of the Two-Step Clustering Analysis, it was determined that teacher-directed science instruction has the most importance in terms of the decomposition of clusters, followed by adaptive instruction in science lessons in terms of importance level.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.7.3.487
Pages: 487-500
cloud_download 349
visibility 1087
2
Article Metrics
Views
349
Download
1087
Citations
Crossref
2

Scopus
1

...

Response times are one of the important sources that provide information about the performance of individuals during a test process. The main purpose of this study is to show that survival models can be used in educational data. Accordingly, data sets of items measuring literacy, numeracy and problem-solving skills of the countries participating in Round 3 of the Programme for the International Assessment of Adult Competencies were used. Accelerated failure time models have been analyzed for each country and domain.  As a result of the analysis of the models in which various covariates are included as independent variables, and response time for giving correct answers is included as a dependent variable, it was found the associations between the covariates and response time for giving correct answers were concluded to vary from one domain to another or from one country to another. The results obtained from the present study have provided the educational stakeholders and practitioners with valuable information.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.7.4.571
Pages: 571-586
cloud_download 412
visibility 1344
2
Article Metrics
Views
412
Download
1344
Citations
Crossref
2

Scopus
2

...

This study reviews 60 papers using a Likert scale and published between 2012 – 2021. Screening for literature review uses the PRISMA method. The data analysis technique was carried out through data extraction, then synthesized in a structured manner using the narrative method. To achieve credible research results at the stage of the data collection and data analysis process, a group discussion forum (FGD) was conducted. The findings show that only 10% of studies use a measurement scale with an even answer choice category (4, 6, 8, or 10 choices). In general, (90%) of research uses a measurement instrument that involves a Likert scale with odd response choices (5, 7, 9, or 11) and the most popular researchers use a Likert scale with a total response of 5 points. The use of a rating scale with an odd number of responses of more than five points (especially on a seven-point scale) is the most effective in terms of reliability and validity coefficients, but if the researcher wants to direct respondents to one side, then a scale with an even number of responses (six points) is possible. more suitable. The presence of response bias and central tendency bias can affect the validity and reliability of the use of the Likert scale instrument.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.8.4.625
Pages: 625-637
cloud_download 2478
visibility 11462
59
Article Metrics
Views
2478
Download
11462
Citations
Crossref
59

Scopus
67

Graded Response Models on the Curiosity Measurement of Elementary School Students

curiosity measurement elementary school graded response models

Herwin Herwin , Riana Nurhayati , Aprilia Tina Lidyasari , Augusto da Costa


...

Curiosity is one of the most important characters for elementary school students. However, the facts in the field show that the measurement model used by the teacher to identify the student's curiosity is not yet available in a standardized manner. This study aims to develop a model for measuring the curiosity of elementary school students using the graded response model (GRM) approach. This research uses quantitative method with descriptive type. The research sample used was 236 elementary school students who were randomly selected. Data were collected using a questionnaire of 16 statement items using a Likert scale approach. The data were analyzed using the response item theory approach with the GRM. The results showed that the model for measuring student curiosity in elementary schools had good location parameters, a good discriminant index, a fairly good information function with a small estimation error. The curiosity measurement model in this study can be used as an alternative for teachers to identify students' curiosity in elementary schools.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.9.1.53
Pages: 53-62
cloud_download 562
visibility 1522
2
Article Metrics
Views
562
Download
1522
Citations
Crossref
2

Scopus
0

...

Developing efficient and reliable tools for assessing early mathematical skills remains a critical priority in educational research. This study aimed to develop and validate a brief version of the Prueba Uruguaya de Matemática (Uruguayan Mathematics Test, PUMa), a digital tool to assess mathematical abilities in children aged 5 to 6. The original test included 144 items covering both symbolic (66%) and non-symbolic (34%) tasks, such as approximate number system, counting, numerical ordering (forward and backward), math fluency, composition and decomposition of numbers, and transcoding auditory-verbal stimuli into Arabic-visual symbols. Unlike most existing tools that require individual administration by trained professionals and lack cultural adaptation for Latin American contexts, PUMa is self-administered, culturally grounded, and suitable for large-scale assessments using tablets. Using a sample of 443 participants and applying parametric and non-parametric models within the framework of Item Response Theory (IRT), along with correlations with TEMA-3, preliminary evidence was generated showing that the brief version retained precision and validity. The resulting shortened tests included 69 and 73 items for the parametric and non-parametric versions, yielding a balanced representation of symbolic (56%) and non-symbolic (44%) tasks. Despite item reduction, ability scores remained highly correlated between original and brief versions (r > .90), and both brief versions demonstrated strong internal consistency (α = .94). PUMa improves upon existing assessments by combining cultural relevance, group-based digital administration, and real-time data collection, offering a scalable solution for early identification and intervention. These features support personalized educational strategies that foster cognitive and academic development from the earliest stages.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.11.2.245
Pages: 245-266
cloud_download 41
visibility 297
0
Article Metrics
Views
41
Download
297
Citations
Crossref
0

...