The Influence of Linguistic Attribute Differences in Multilingual Datasets on Sarcasm Detection
Abstract
Presently, social media is crucial for sentiment analysis tasks with machine learning. However, the presence of sarcasm presents a challenge to this task by concealing the true intent of a text. Consequently, there has been a surge in research on automatic sarcasm detection. Furthermore, studying with sarcasm detection with multilingual datasets is becoming indispensable, because it can sovle data scarcity problem for low-resource languages and can also reduce the cost of training models for different languages. Past research has largely overlooked the influence of language diversity within training datasets on model performance. This study assumes that linguistic differences may influence sarcastic expressions and employs two datasets: English- Arabic dataset, which belongs to the same category of morphological typology, and English-Chinese dataset which belongs to different categories. Subsequently, models were trained with BERT, BERT-BiLSTM, and BERT-RCNN architectures. Finally, results were compared using two English test datasets with different patterns. The outcomes revealed superior training results for English-Arabic in contrast to English-Chinese, signifying the influence of morphological typology. In addition, BiLSTM and RCNN architectures can enhance the performance of multilingual sarcasm detection models. And the RCNN structure appears to be beneficial for detecting sarcasm in different patterns.
References
Lakshya Kumar, Arpan Somani, and Pushpak Bhattacharyya. “Having 2 hours to write a paper is fun!”: Detecting Sarcasm in Numerical Portions of Text. arXiv pre-print arXiv:1709.01950, 2017.
Akshay Khatri et al. Sarcasm detection in tweets with BERT and GloVe embeddings. arXiv preprint arXiv:2006.11512, 2020.
Linshuo Yang and Daisuke Ikeda. The Impact of Language Properties in Multilin-gual Datasets on Sarcasm Detection. the 14th International Congress on Advanced Applied Informatics (IIAI-AAI), the 16th International Conference on E-Service and Knowledge Management (ESKM 2023), pages 1–6, 2023.
Joseph Tepperman, David Traum, and Shrikanth Narayanan. “YEAH RIGHT”: SARCASM RECOGNITION FOR SPOKEN DIALOGUE SYSTEMS. Technical re-port, UNIVERSITY OF SOUTHERN CALIFORNIA LOS ANGELES, 2006.
Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. Sarcasm as Contrast between a Positive Sentiment and Negative Situation. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 704–714, 2013.
Aditya Joshi, Vinita Sharma, and Pushpak Bhattacharyya. Harnessing Context In-congruity for Sarcasm Detection. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confer-ence on Natural Language Processing (Volume 2: Short Papers), pages 757–762, 2015.
Antonio Reyes, Paolo Rosso, and Tony Veale. A multidimensional approach for detecting irony in Twitter. Language resources and evaluation, 47:239–268, 2013.
Aniruddha Ghosh and Tony Veale. Fracking Sarcasm using Neural Network. In Proceedings of the 7th workshop on computational approaches to subjectivity, sen-timent and social media analysis, pages 161–169, 2016.
Byron C Wallace, Eugene Charniak, et al. Sparse, Contextually Informed Models for Irony Detection: Exploiting User Communities, Entities and Sentiment. In Pro-ceedings of the 53rd Annual Meeting of the Association for Computational Linguis-tics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1035–1044, 2015.
Yi Tay, Luu Anh Tuan, Siu Cheung Hui, and Jian Su. Reasoning with Sarcasm by Reading In-between. arXiv preprint arXiv:1805.02856, 2018.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805, 2018.
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. ERNIE: Enhanced Representation through Knowledge Integration. arXiv preprint arXiv:1904.09223, 2019.
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. Recurrent Convolutional Neural Networks for Text Classification. In Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015.
Marco Stranisci, Cristina Bosco, Delia Irazú Hernández Farías, and Viviana Patti. Annotating Sentiment and Irony in the Online Italian Political Debate on #la-buonascuola. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 2892–2899, Portorož, Slovenia, May 2016. European Language Resources Association (ELRA).
Satoshi Hiai and Kazutaka Shimada. Sarcasm Detection Using Features Based on Indicator and Roles. In Recent Advances on Soft Computing and Data Mining: Proceedings of the Third International Conference on Soft Computing and Data Mining (SCDM 2018), Johor, Malaysia, February 06-07, 2018, pages 418–428. Springer, 2018.
Ahmed Abbasi, Hsinchun Chen, and Arab Salem. Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM trans-actions on information systems (TOIS), 26(3):1–34, 2008.
Aditya Joshi, Ameya Prabhu, Manish Shrivastava, and Vasudeva Varma. Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2482–2491, 2016.
Deepak Jain, Akshi Kumar, and Geetanjali Garg. Sarcasm detection in mashup language using soft-attention based bi-directional LSTM and feature-rich CNN. Applied Soft Computing, 91:106198, 2020.
Yaqian Han, Yekun Chai, Shuohuan Wang, Yu Sun, Hongyi Huang, Guanghao Chen, Yitong Xu, and Yang Yang. X-PuDu at Semeval-2022 Task 6: Multilingual Learning for English and Arabic Sarcasm Detection. arXiv preprint arXiv:2211.16883, 2022.
Ibrahim Abu Farha, Silviu Vlad Oprea, Steven Wilson, and Walid Magdy. SemEval-2022 Task 6: iSarcasmEval, Intended Sarcasm Detection in English and Arabic. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 802–814, Seattle, United States, July 2022. Association for Computational Linguistics.
Yizhang Zhu. Open Chinese Internet Sarcasm Corpus Construction: An Approach. Frontiers in Computing and Intelligent Systems, 2(1):7–9, 2022.
Rishabh Misra and Prahal Arora. Sarcasm Detection using Hybrid Neural Network. arXiv preprint arXiv:1908.07414, 2019.
Rishabh Misra and Jigyasa Grover. Sculpting Data for ML: The first act of Machine Learning. Jan. 2021.
Ibrahim Abu Farha, Wajdi Zaghouani, and Walid Magdy. Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, 2021.
Yi-jie Tang and Hsin-Hsi Chen. Chinese Irony Corpus Construction and Ironic Structure Analysis. In Proceedings of COLING 2014, the 25th International Con-ference on Computational Linguistics: Technical Papers, pages 1269–1278, 2014.
Mengfei Yuan, Zhou Mengyuan, Lianxin Jiang, Yang Mo, and Xiaofeng Shi. stce at SemEval-2022 Task 6: Sarcasm Detection in English Tweets. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval- 2022), pages 820–826, 2022.
Jason Angel, Segun Aroyehun, and Alexander Gelbukh. TUG-CIC at SemEval- 2021 Task 6: Two-stage Fine-tuning for Intended Sarcasm Detection. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval- 2022), pages 951–955, 2022.
Xiyang Du, Dou Hu, Jin Zhi, Lianxin Jiang, and Xiaofeng Shi. PALI-NLP at SemEval-2022 Task 6: iSarcasmEval-Fine-tuning the Pre-trained Model for Detect-ing Intended Sarcasm. In Proceedings of the 16th International Workshop on Se-mantic Evaluation (SemEval-2022), pages 815–819, 2022.
Abdelkader El Mahdaouy, Abdellah El Mekki, Kabil Essefar, Abderrahman Skiredj, and Ismail Berrada. CS-UM6P at SemEval-2022 Task 6: Transformerbased Models for Intended Sarcasm Detection in English and Arabic. arXiv preprint arXiv:2206.08415, 2022.