Correction Method for Character Recognition of Handwritten Answers in Multiple Formats for Short Essay e-Learning Systems
Abstract
We are aiming to develop an e-learning system for improving Japanese language proficiency. This system not only provides instant scoring and advice in response to learners' answers but also automatically generates questions to offer a wide range of problems across various fields. In this paper, as the first step for this purpose, we aim to accurately extract the content described in handwritten essays. When dealing with handwritten characters, sufficient accuracy cannot be obtained with OCR software alone due to noise, distortion, and idiosyncratic handwriting. Therefore, we first train a neural network for character recognition using a handwritten character database, and then divide the essay images into single characters to determine the most likely character candidates. Furthermore, we treat character identification as a fill-in-the-blank problem for sentences and use BERT's Masked Language Model task to determine characters that form natural sentences. Applying this method to handwritten short essays written in multiple formats has enabled more accurate character extraction than before.
References
T. Fujimoto, Y. Ara, and Y. Yamauchi, ``The Current Status of Learning Analytics Research for Massive Open Online Courses (MOOCs),'' Japan Society for Educational Technology, Vol.41, No.3, 2018, pp.305-313.
T. Yamasaki and A. Hiramatsu, ``A Study of Correcting Handwritten Answers for Short Essay Self-learning Systems,'' Proc. 14th International Conf. on Learning Technologies and Learning Environments (LTLE 2023), 2023.
T. Ishioka, ``Latest Trends in Automated Essay Scoring and Evaluatio,'' Japanese Society for Artificial Intelligence, Vol.23, No.1, 2008, pp.17-24.
J. Burstein and M. Wolska, ``Toward evaluation of writing style: Finding overly repetitive word use in.student essays, '' Proc. 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL '03), 2013, pp. 35-42.
E. B. Page, ``New computer grading of student prose, using modern concepts and software, '' Experimental Education, Vol.62, No.2, 1994, pp.127-142.
T. K. Landauer, D. Laham, and P. Foltz, ``Automated scoring and annotation of essays with the intelligent essay assessor,'' Automated Essay Scoring: A Crossdisciplinary Perspective, 2003, pp.87-112.
S. Elliot, ``IntelliMetric: From Here to Validity,'' Automated Essay Scoring: A Crossdisciplinary Perspective, 2003. pp.71-86.
V. S. Kumar and D. Boulanger, ``Automated Essay Scoring and the Deep Learning Black Box: How Are Rubric Scores Determined ?,'' Artificial Intelligence in Education, Vol.31, 2021, pp.538-584.
D. Ramesh and S. K. Sanampudi, ``An automated essay scoring systems: a systematic literature review,'' Artificial Intelligence Review, Vol.55, 2022, pp.2495-2527.
A. Hiramatsu and T. Yamasaki, ``Comparison of Similarity Calculation Methods Using Graph Representation for Scoring Summary Documents for Essay Learning,'' IEEJ Annual Conference on Electronics Information and Systems, GS1-3, 2021, pp.898-903.
K. Takeuchi, Y. Matsumoto, ``OCR Error Correction Using Stochastic Morphological Analyzer with Probabilistic Word Model,'' IPSJ SIG Technical Report, 1997-NL-121, 1997, pp.17-24.
F. Sato, M. Kitsuregawa, ``Improvement of OCR recognition rate in post-processing by combining OCR character probability and pre-trained BERT MASK candidate,'' IPSJ SIG Technical Report, Vol.2020-ACC-13, No.3, 2020, pp.1-5.
S. Zhang, H. Huang, J. Liu, and H. Li, ``Spelling Error Correction with Soft-Masked BERT,'' Proc. 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 2020, pp.882-890.
J. Salazar, D. Liang, T. Q. Nguyen, and K. Kirchhoff, ``Masked Language Model Scoring,'' Proc. 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 2020, pp. 2699-2712.
M. Kaneko, M. Mita, S. Kiyono, J. Suzuki, and K. Inui, ``Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction,'' Proc. 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 2020, pp. 4248-4254.
Japanese Technical Committee for Optical Character Recognition, ETL-9B Character Database, 1973-1984.