Unsupervised Detection of Domain Switching in Thai Multidisciplinary Online News
DOI:
https://doi.org/10.52731/liir.v003.077Keywords:
Latent Dirichlet Allocation, Network Text Analysis, Natural language processing, Multidisciplinary Knowledge.Abstract
Electronic news has become a popular method of keeping up with digital information, where news tracking is more accessible and reaches a broader variety of audiences. However, ambiguous contexts can be an obstacle to news consumption, causing online disputes, cyberbullying, and political radicalization. This paper demonstrates a network text analysis with a generative statistical model called Latent Dirichlet allocation to extract terminologies and generate a co-occurrence network across multidisciplinary knowledge. The network points out that each terminology corresponds to different domains which are to recognize interpretations of news readers.
References
R. Popping, “Knowledge graphs and network text analysis,” Soc. Sci. Inf., vol. 42, no. 1, pp. 91–106, 2003, doi: 10.1177/0539018403042001798.
D. M. Blei, A. Y. Ng, and M. T. Jordan, “Latent dirichlet allocation,” Adv. Neural Inf. Process. Syst., vol. 3, no. Jan, pp. 993–1022, 2002.
A. Takhom, P. Boonkwan, M. Ikeda, S. Usanavasin, and T. Supnithi, “Reducing miscommunication in cross-disciplinary concept discovery using network text analysis and semantic embedding,” in The 6th Joint International Semantic Technology Conference, CEUR Workshop Proceedings 1741, 2017, vol. 2000, pp. 20–31, [Online]. Available: http://ceur-ws.org/Vol-2000/paos2017_paper3.pdf.
A. S. Vaz et al., “The progress of interdisciplinarity in invasion science,” Ambio, vol. 46, no. 4, pp. 428–442, 2017, doi: 10.1007/s13280-017-0897-7.
J. Chuang, C. D. Manning, και J. Heer, ‘Termite: Visualization techniques for assessing textual topic models’, στο Proceedings of the international working conference on advanced visual interfaces, 2012, σσ. 74–77.
Brandes, Ulrik. "On variants of shortest-path betweenness centrality and their generic computation". Social networks vol. 30, no.2, pp. 136-145, 2008.
C. Sievert και K. Shirley, ‘LDAvis: A method for visualizing and interpreting topics’, στο Proceedings of the workshop on interactive language learning, visualization, and interfaces, 2014, σσ. 63–70.
D. Leenoi, et al., “A Construction of Thai WordNet through Translation Equivalence”, in The 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2022), November 5-7, 2022
P. Eckert και J. R. Rickford, Style and sociolinguistic variation. Cambridge University Press, 2001.
J. Sun et al., “COVID-19: epidemiology, evolution, and cross-disciplinary perspec-tives,” Trends Mol. Med., vol. 26, no. 5, pp. 483–495, 2020.
S. W. Kim and J. M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,” Human-centric Comput. Inf. Sci., vol. 9, no. 1, pp. 1–21, 2019, doi: 10.1186/s13673-019-0192-7.
H. Sheikha, “Text mining Twitter social media for Covid-19 Comparing latent semantic analysis and latent Dirichlet allocation.” 2020, [Online]. Available: http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-32567.
T. H. Nguyen, B. Plank, and R. Grishman, “Semantic representations for domain adaptation: A case study on the tree kernel-based method for relation extraction,” in ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference, 2015, vol. 1, pp. 635–644, doi: 10.3115/v1/p15-1062.
A. Kumar and S. Dinakaran, “Textbook to triples: Creating knowledge graph in the form of triples from AI TextBook,” arXiv Prepr. arXiv2111.10692, 2021.
A. Takhom, D. Leenoi, C. Sophaken, P. Boonkwan, and T. Supnithi, “An Approach of Network Analysis Enhancing Knowledge Extraction in Thai Newspapers Contexts,” J. Intell. Informatics Smart Technol., vol. 6, no. October 2021, pp. 19–24, 2021, [Online]. Available: https://jiist.aiat.or.th/assets/uploads/1635853027829tBupD1635602106085fdegH39.pdf.