Contrastive Learning for Fine-Grained Reading Detection
Abstract
Reading is a cognitive activity that we perform aiming at various purposes, such as gaining knowledge and entertaining ourselves, with different scripts and layouts. Therefore, automatic reading detection gives useful information about users’ reading activities. Deep learning enables automatic feature extraction and model creation but needs large-sized labeled data. The self-supervised learning devised to overcome this limitation work as noncontrastive self-supervised learning (SSL) and contrastive self-supervised learning (contrastive learning). Although SSL is well explored for reading analysis, contrastive learning is not still well explored. This paper explores contrastive learning that works in several ways. A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) is one way that has attracted much attention in many research domains because of its superior performance. We explore SimCLR for the cognitive activity recognition task of finegrained reading detection employing electrooculography datasets. These datasets describe eye movements that have been recorded for in-the-wild condition. The obtained results are compared against SSL and supervised baselines. The results show that, for an equal number of training samples, the SimCLR method obtains a maximum performance gain of 3.02 and 3.96 percentage points compared to the two baselines, respectively. Besides, SimCLR shows the best performance for large-sized data with a data efficiency of about 80%, whereas SSL shows the best performance for small-sized data. The analysis conducted in this paper shows a direction for researchers and system designers to employ self-supervised learning for automatic reading detection.
References
[2] A. E. Cunningham and K. E. Stanovich, “What reading does for the mind,” American Educator, vol. 22, pp. 8-17, 1998.
[3] A. Bulling, J. A. Ward, H. Gellersen, and G. Troster, “Eye movement analysis for activity recognition using electrooculography,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 4, pp. 741–753, 2011.
[4] G. J. Welk, J. A. Differding, R. W. Thompson, S. N. Blair, J. Dziura, and P. Hart, “The utility of the digi-walker step counter to assess daily physical activity patterns,” Medicine and Science in Sports and Exercise, vol. 32, no. 9, pp. S481-S488, 2000.
[5] O. Augereau, C. L. Sanches, K. Kise, and K. Kunze, “Wordometer systems for everyday life,” Proceedings of the ACM on IMWUT, vol. 1, no. 4, pp. 1–21, 2018.
[6] M. Landsmann, O. Augereau, and K. Kise, “Classification of reading and not reading behavior based on eye movement analysis,” Adjunct Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, UbiComp/ISWC ’19 Adjunct, pp. 109–112, 2019.
[7] Wikipedia, “Horizontal and vertical writing in east asian scripts,” https://en.wikipedia.org/w/index.php?title=Horizontal and vertical wrtting in East Asian scripts&oldid=984358336, Accessed: December 25, 2022.
[8] A. Bulling, J. A. Ward, and H. Gellersen, “Multimodal recognition of reading activity in transit using body-worn sensors,” ACM Transactions on Applied Perception, vol. 9, no. 1, article no. 2, pp. 1–21, 2012.
[9] A. Strukelj and D. C. Niehorster, “One page of text: eye movements during regular and thorough reading, skimming, and spell checking,” Journal of Eye Movement Research, vol. 11, no. 1, pp. 1–22, 2018.
[10] N. Y. Hammerla, S. Halloran, and T. Plotz, “Deep, convolutional, and recurrent models for human activity recognition using wearables,” Proceedings of the Twenty-Fifth International Joint Conference on AI, AAAI Press, pp. 1533-1540, 2016.
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Commu. of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
[12] A. Graves, A. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649, 2013.
[13] Y. Roh, G. Heo, and S. E. Whang, “A survey on data collection for machine learning: a big data - AI integration perspective,” CoRR, abs/1811.03402, 2018.
[14] A. Saeed, T. Ozcelebi, and J. Lukkien, “Multi-task self-supervised learning for human activity detection,” Proceedings of the ACM on IMWUT, vol. 3, no. 2, pp. 1-30, 2019.
[15] S. Albelwi, “Survey on self-supervised learning: auxiliary pretext tasks and contrastive learning methods in imaging,” Entropy, vol. 24, no. 4, pp. 1–22, 2022.
[16] T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. Hinton, “Big self-supervised models are strong semi-supervised learners,” ArXiv:2006.10029, 2020.
[17] K. Shah, D. Spathis, C. I. Tang, and C. Mascolo, “Evaluating contrastive learning on wearable timeseries for downstream clinical outcomes,” ArXiv:2111.07089, 2021.
[18] L. Ericsson, H. Gouk, C. C. Loy, and T. M. Hospedales, “Self-supervised representation learning: Introduction, advances, and challenges,” IEEE Signal Processing Magazine, vol. 39, no. 3, pp. 42–62, 2022.
[19] A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive self-supervised learning,” Technologies, vol. 9, no. 1, 2021.
[20] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” Proceedings of the 37th International Conference on Machine Learning, PMLR, vol. 119, pp. 1597-1607, 2020.
[21] C. I. Tang, I. Perez-Pozuelo, D. Spathis, and C. Mascolo, “Exploring contrastive learning in human activity recognition for healthcare,” ArXiv:2011.11542, 2020.
[22] J. Wang, T. Zhu, J. Gan, L. Chen, H. Ning, and Y. Wan, “Sensor data augmentation by resampling for contrastive learning in human activity recognition,” ArXiv:2109.02054, 2021.
[23] M. R. Islam et al., “Self-supervised learning for reading activity classification,” Proceedings of the ACM on IMWUT, vol. 5, no. 3, article no. 105, pp. 1-22, 2021.
[24] A. Saeed, V. Ungureanu, and B. Gfeller, “Sense and Learn: Self-Supervision for Omnipresent Sensors,” CoRR abs/2009.13233, 2020.
[25] S. R. Taghanaki and A. Etemad, “Self-supervised wearable-based activity recognition by learning to forecast motion,” ArXiv, 2020.
[26] H. Haresamudram et al., “Masked reconstruction based self-supervision for human activity recognition,” Proceedings of the International Symposium on Wearable Computers, Mexico (Virtual Event), ACM, pp. 45–49, 2020.
[27] C. I. Tang, I. Perez-Pozuelo, D. Spathis, S. Brage, N. Wareham, and C. Mascolo, “SelfHAR: Improving Human Activity Recognition through Self-training with Unlabeled Data,” ArXiv Preprint ArXiv:2102.06073, 2021.
[28] F. Wang and H. Liu, “Understanding the behaviour of contrastive loss,” Proceedings of the IEEE/CVF Conference on Comp. Vision and Patt. Recog., pp. 2495-2504, 2021.
[29] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742, 2006.
[30] H. Haresamudram, I. Essa, and T. Plotz, “Contrastive predictive coding for human ¨ activity recognition,“ Proceedings of ACM on IMWUT, vol. 5. no. 2, 2021.
[31] F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823, 2015.
[32] X. Chen and K. He, “Exploring simple siamese representation learning,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758, 2021.
[33] B. Khaertdinov, E. Ghaleb, and S. Asteriadis, “Contrastive self-supervised learning for sensor-based human activity recognition,” 2021 IEEE International Joint Conference
on Biometrics, pp. 1-8, 2021.
[34] L. Copeland, T. Gedeon, and S. Mendis, “Predicting reading comprehension scores from eye movements using artificial neural networks and fuzzy output error,” Artificial
Intelligence Research, vol. 3, no. 3, pp. 35–48, 2014.
[35] M. R. Islam, A. W. Vargo, M. Iwata, M. Iwamura, and K. Kise. “Evaluating Contrastive Learning for Fine-grained Reading Detection,” 12th International Congress on Advanced Applied Informatics (IIAI-AAI), IEEE, pp. 430-435, 2022.
[36] M. R. Islam, A. W. Vargo, M. Iwata, M. Iwamura, and K. Kise, “Exploring Sensor Modalities to Capture User Behaviors for Reading Detection,” IEICE Transactions on Information and Systems, vol. 105, no. 9, pp. 1629-33, 2022.
[37] S. Ishimaru, T. Maruichi, M. Landsmann, K. Kise, and A. Dengel, “Electrooculography dataset for reading detection in the wild,” Adjunct Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, London, UK, pp. 85–88, 2019.