Question Generation for English Reading Comprehension Exercises using Transformers

Authors

  • Alexander Maas Tohoku University
  • Kazunori D Yamada Tohoku University
  • Toru Nagahama Tohoku University
  • Taku Kawada Tohoku University
  • Tatsuya Horita Tohoku University

DOI:

https://doi.org/10.52731/liir.v005.183

Keywords:

Artificial Intelligence, English Language Education, Natural Language Processing, Reading Comprehension Exercises

Abstract

In secondary language education, one tool used by teachers to test students' language ability is reading comprehension. The construction of these problems can take a lot of time as the text needs to contain only the vocabulary and grammar the students know, and the questions also need to test the reasoning skills the teachers want to evaluate. To allow educators to use reading comprehension exercises more frequently, this research aims to alleviate the time constraint of creating these questions by training a controllable transformer-based natural language processing model to create questions of varying types and about a passage of text as specified by the user. After fine-tuning, the questions generated using the new controls either suffered from overfitting or from a lack of diversity between them, however the output of an existing question generation control was altered and became capable of generating questions suitable for use in reading comprehension. To improve the output of the new controls, more data could be used in the training, or an alternative training scheme would need to be utilized.

References

J. Oakhill, K. Cain, and C. Elbro, Understanding and teaching reading compre-hension: A handbook, Routledge, 2014, pp. 15.

K. Aizawa, “Relationship between vocabulary coverage and comprehension in reading comprehension [読解における語彙カバー率と理解度の関係],” Edu-cational materials research 22, 2011, pp. 22.

T. Kanbayashi, “The influence of peripheral duties on the sense of busyness and burden of public elementary and junior high school teachers: Focusing on the workload per unit of time [周辺的職務が公立小・中学校教諭の多忙感・負担感に与える影響: 単位時間あたりの労働負荷に着目して],” Bulletin of the Japanese Society of Educational Management 57, 2015, pp.79.

A. Vaswani et al., “Attention is all you need,” Advances in neural information processing systems 30 (NIPS 2017), 2017, pp. 5998.

OpenAI, “Chatgpt: Optimizing language models for dialogue.” OpenAI.

https://openai.com/blog/chatgpt/ (Accessed Feb. 7, 2023).

OpenAI, “GPT-4.” OpenAI. https://openai.com/research/gpt-4 (Accessed Oct. 24, 2023).

J. Kocoń et al. "ChatGPT: Jack of all trades, master of none." Information Fusion (2023): 101861.

A. Maas, T. Kawada, K. Yamada, T. Nagahama, and T. Horita, “Identifying Latent Traits of Questions for Controllable Machine Generation,” EdMedia+ Innovate Learning (EDIL 22), 2022, pp. 42.

G. Lai, Q. Xie, H. Liu, Y. Yang, and E. Hovy, “Race: Large-scale reading compre-hension dataset from examinations,”. Proc. of the 2017 Conf. on Empirical Methods in Natural Language Processing, 2017, pp. 785.

M. Richardson, C.J. Burges, and E. Renshaw, “Mctest: A challenge dataset for the open-domain machine comprehension of text,” Proc. of the 2013 Conf. on Empirical Methods in Natural Language Processing, 2013, pp. 193.

A. Trischler et al., “Newsqa: A machine comprehension dataset,” Proc. of the 2nd Workshop on Representation Learning for NLP, 2017, pp.191.

K.M. Hermann, et al., “Teaching machines to read and comprehend,” Advances in neural information processing systems 28 (NIPS 2015), 2015, pp. 1693.

L. Van Der Maaten, “t-SNE – FAQ.” Laurens van der Maaten Github.io.

https://lvdmaaten.github.io/tsne/ (Accessed Oct. 18, 2022).

L. McInnes, “Transforming New Data with UMAP.” UMAP.

https://umap-learn.readthedocs.io/en/latest/transform.html (Accessed Sep. 18, 2022)

L. McInnes, J. Healy, and S. Astels, “hdbscan: Hierarchical density based clus-tering,” Journal of Open Source Software, vol. 2, no. 11, 2017, pp. 205.

G. Kurdi, J. Leo, B. Parsia, U. Sattler, and S. Al-Emari, “A systematic review of automatic question generation for educational purposes,” International Journal of Artificial Intelligence in Education, vol. 30, 2020, pp. 121.

M. Roemmele, D. Sidhpura, S. DeNeefe, and L. Tsou, “AnswerQuest: A system for generating question-answer items from multi-paragraph documents,” Proc. of the 16th Conf. of the European Chapter of the Association for Computational Linguis-tics: System Demonstrations, 2021, pp. 40.

D. Su, et al., “Multi-hop question generation with graph convolutional network,” Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 4636.

N.S. Keskar, B. McCann, L.R. Varshney, C. Xiong, and R. Socher, “Ctrl: A condi-tional transformer language model for controllable generation,” arXiv pre-print:1909.05858, 2019.

M. Negishi et al. “New Crown English Series New Edition 3,” Sanseido, 2015, pp. 54.

Eiken Foundation of Japan, “Eiken test in practical english prociency, grade 3.” EIKEN. https://www.eiken.or.jp/eiken/exam/grade_3/pdf/202203/2022-3-1ji-3kyu.pdf (Accessed Mar. 14, 2023).

Downloads

Published

2024-02-03