Accurate Student Ability Estimation by Removing Teacher Evaluation Bias via Full Computer Based Testing

  • Hideo Hirose Kurume University
Keywords: ability estimation, item response theory, computer based testing, irreducible probabilistic fluctuation, evaluation bias, description type testing, multiple choice type testing, academic growth, ability equation, learning analytics

Abstract

A test score does not represent the exact ability of an examinee. It only shows just one aspect of the examinee, even if the coverage of the test isrestricted. Due to this, for example, we may not see obvious relationships between entrance examination scores and academic records in universities, even in mathematics subjects. Thus, in order to make clear such a relationship in a statistical sense, we have investigated three testing records of the placement test, the learning check test, and term examinations.
Then, we have shown mainly three consequences from the investigation: 1) by using the full computer based testing results of the placement test, we have become aware of the magnitude of irreducible probabilistic fluctuations; 2) in using the description type testing, it would be inevitable to accept biased evaluations by teachers; 3) by adopting full computer based testing in the placement test, the learning check test, and term examinations, we can remove the teacher’s evaluation bias occurred in the description type testing, and can obtain the more accurate student’s ability.
In addition, we have proposed a fundamental ability equation on student’s ability including irreducible probabilistic fluctuations.

References

S. L. Annan, S. Tratnack, C. Rubenstein, E. Metzler-Sawin, L. Hulton, An Integrative Review of Student Evaluations of Teaching: Implications for Evaluation of Nursing Faculty, Journal of Professional Nursing, 29, 2013, pp.10-24.

R. de Ayala, The Theory and Practice of Item Response Theory. Guilford Press, 2009.

M. Bamber, The impact on stakeholder confidence of increased transparency in the examination assessment process, Assessment & Evaluation in Higher Education, 40, 2014, pp. 471-487.

A. Campbell, Application of ICT and rubrics to the assessment process where professional judgement is involved: the features of an e marking tool, Assessment & Evaluation in Higher Education, 30, 2005, pp.529-537.

A. Chirumamilla, G. Sindre, A. Nguyen-Duc, Cheating in e-exams and paper exams: the perceptions of engineering students and teachers in Norway, Assessment & Evaluation in Higher Education, 45, 2020, pp. 940-957.

D. E. Clayson, Student evaluation of teaching and matters of reliability, Assessment & Evaluation in Higher Education, 43, 2018, pp.666-681.

S. Gonzalez, The Pros and Cons of Computer-Based Standardized Testing for Elementary Students, Capstone Projects and Master’s Theses. 853. (2020).

R. Hambleton, H. Swaminathan, and H. J. Rogers, Fundamentals of Item Response Theory. Sage Publications, 1991.

E. H. Haertel, Reliability and Validity of Inferences about Teachers Based on Student Test Scores, Mathematics Education Trends and Research, The 14th William H. Angoff Memorial Lecture, 2013, Educational Testing Service.

S. He, K. Kempe, Y. Tomoki, M. Nishizuka, T. Suzuki, T. Dambara, T. Okada, Correlations between Entrance Examination Scores and Academic Performance Following Admission, Juntendo Medical Journal, 2015, pp. 1-7.

K. K. Hensley, Examining the effects of paper-based and computer-based modes of assessment on mathematics curriculum-based measurement, Mathematics Education Trends and Research, Ph.D Thesis in Teaching and Learning, 2015, The University of Iowa.

H. Hirose and T. Sakumura, Test evaluation system via the web using the item response theory, in Computer and Advanced Technology in Education, 2010, pp.152-158.

H. Hirose, T. Sakumura, Item Response Prediction for Incomplete Response Matrix Using the EM-type Item Response Theory with Application to Adaptive Online Ability Evaluation System, IEEE International Conference on Teaching, Assessment, and Learning for Engineering, 2012, pp.8-12.

H. Hirose, Yu Aizawa, Automatically Growing Dually Adaptive Online IRT Testing System, IEEE International Conference on Teaching, Assessment, and Learning for Engineering, 2014, pp.528-533.

H. Hirose, Y. Tokusada, K. Noguchi, Dually Adaptive Online IRT Testing System with Application to High-School Mathematics Testing Case, IEEE International Conference on Teaching, Assessment, and Learning for Engineering, 2014, pp.447-452.

H. Hirose, Y. Tokusada, A Simulation Study to the Dually Adaptive Online IRT Testing System, IEEE International Conference on Teaching, Assessment, and Learning for Engineering, 2014, pp.97-102.

H. Hirose, Meticulous Learning Follow-up Systems for Undergraduate Students Using the Online Item Response Theory, 5th International Conference on Learning Technologies and Learning Environments, 2016, pp.427-432.

H. Hirose, M. Takatou, Y. Yamauchi, T. Taniguchi, T. Honda, F. Kubo, M. Imaoka, T. Koyama, Questions and Answers Database Construction for Adaptive Online IRT Testing Systems: Analysis Course and Linear Algebra Course, 5th International Conference on Learning Technologies and Learning Environments, 2016, pp.433-438.

H. Hirose, Learning Analytics to Adaptive Online IRT Testing Systems “Ai Arutte” Harmonized with University Textbooks, 5th International Conference on Learning Technologies and Learning Environments, 2016, pp.439-444.

H. Hirose, M. Takatou, Y. Yamauchi, T. Taniguchi, F. Kubo, M. Imaoka, T. Koyama, Rediscovery of Initial Habituation Importance Learned from Analytics of Learning Check Testing in Mathematics for Undergraduate Students, 6th International Conference on Learning Technologies and Learning Environments, 2017, pp.482-486.

H. Hirose, Success/Failure Prediction for Final Examination Using the Trend of Weekly Online Testing, 7th International Conference on Learning Technologies and Learning Environments, 2018, pp.139-145.

H. Hirose, Attendance to Lectures is Crucial in Order Not to Drop Out, 7th International Conference on Learning Technologies and Learning Environments, 2018, pp.194-198.

H. Hirose, Time Duration Statistics Spent for Tackling Online Testing, 7th International Conference on Learning Technologies and Learning Environments, 2018, pp.221-225.

H. Hirose, Prediction of Success or Failure for Examination using Nearest Neighbor Method to the Trend of Weekly Online Testing, International Journal of Learning Technologies and Learning Environments, 2, 2019, pp.19-34.

H. Hirose, Relationship Between Testing Time and Score in CBT, International Journal of Learning Technologies and Learning Environments, 2, 2019, pp.35-52.

H. Hirose, Current Failure Prediction for Final Examination using Past Trends of Weekly Online Testing, 9th International Conference on Learning Technologies and Learning Environments, 2020, pp.142-148.

H. Hirose, More Accurate Evaluation of Student’s Ability Based on A Newly Proposed Ability Equation, 9th International Conference on Learning Technologies and Learning Environments, 2020, pp.176-182.

H. Hirose, Difference Between Successful and Failed Students Learned from Analytics of Weekly Learning Check Testing, Information Engineering Express, Vol 4, 2018, pp.11-21.

H. Hirose, Key Factor Not to Drop Out is to Attend Lectures, Information Engineering Express, 5, 2019, pp.59-72.

H. Hirose, Dually Adaptive Online IRT Testing System, Bulletin of Informatics and Cybernetics Research Association of Statistical Sciences, 48, 2016, pp.1-17.

W. J. D. Linden and R. K. Hambleton, Handbook of Modern Item Response Theory. Springer, 1996.

H. W Marsh, Students’ Evaluations of University Teaching: Dimensionality, Reliability, Validity, The Scholarship of Teaching and Learning in Higher Education: An Evidence-Based Perspective, Springer, 2020, pp.319-383.

H. W Marsh, Student Evaluations of Teaching Encourages Poor Teaching and Contributes to Grade Inflation: A Theoretical and Empirical Analysis, Basic and Applied Social Psychology, 42, 2020, pp.276-294.

K. Meissel, F. Meyer, E. S. Yao, C. M. Rubie-Davies, Subjectivity of teacher judgments: Exploring student characteristics that influence teacher judgments of student ability, Teaching and Teacher Education, 65 (2017) 48-60.

B. O’Connell, P. De Lange, M. Freeman, P. Hancock, A. Abraham, B. Howieson, K. Watty, Does Calibration Reduce Variability in the Assessment of Accounting Learning Outcomes?, Assessment & Evaluation in Higher Education, 41, 2016, pp. 331-349.

K. Otani, B. J. Kim, J.-IL Cho, Student Evaluation of Teaching (SET) in Higher Education: How to Use SET More Effectively and Efficiently in Public Affairs Education, Journal of Public Affairs Education, 18, 2012, pp.531-544.

F. Ostad-Ali, M.H. Behzadi, A. Shahvarani, Descriptive Qualitative Method of Evaluation from the Viewpoint of Math Teachers and Its Comparison with the Quantitative Evaluation (Giving scores) Method (A Case Study on the Primary Schools for Girls in Zone 1 of Tehran City), Mathematics Education Trends and Research, 20, 2015, pp.50-56.

D. Quah, Galton’s Fallacy and Tests of the Convergence Hypothesis. The Scandinavian Journal of Economics. 95, 427-433, 1993.

H. Retnawati, The Comparison of Accuracy Scores on the Paper and Pencil Testing vs. Computer- Based Testing, The Turkish Online Journal of Educational Technology, 14, 2015, pp.135-142.

T. Sakumura and H. Hirose, Making up the Complete Matrix from the Incomplete Matrix Using the EM-type IRT and Its Application, Transactions on Information Processing Society of Japan (TOM), 72, 2014, pp.17-26.

T. Sakumura, H. Hirose, Bias Reduction of Abilities for Adaptive Online IRT Testing Systems, International Journal of Smart Computing and Artificial Intelligence, 1, 2017, pp.57-70.

H.M. Salahdeen, B.A. Murtaba, Relationship between Admission Grades and Performances of Students in the First Professional Examination in a New Medical School, African Journal of Biomedical Research, 8, 2005, pp.51-57.

S. Siegel, Non-parametric statistics for the behavioral sciences. New York: McGrawHill, 1956.

S.M. Stigler, Regression towards the mean, historically considered, 6, 1997, pp.103-114.

Y. Tokusada, H. Hirose, Evaluation of Abilities by Grouping for Small IRT Testing Systems, 5th International Conference on Learning Technologies and Learning Environments, 2016, pp.445-449.

P. Spooren, B. Brockx, D. Mortelmans, On the Validity of Student Evaluation of Teaching: The State of the Art, Review of Educational Research, 83, 2013, pp.598-642.

W. Stroebe, Student Evaluations of Teaching Encourages Poor Teaching and Contributes to Grade Inflation: A Theoretical and Empirical Analysis, Basic and Applied Social Psychology, 42, 2020, pp.276-294.

https://www.slideshare.net/HideoHirose/regression-fallacies

Published
2021-10-31
Section
Technical Papers