Adaptable Expression Search Framework with Customizable Pattern Matching for Language Studies

  • Tatsuya Katsura Okayama University Graduate School of Environmental, Life and Natural Sciences
  • Koichi Takeuchi Okayama University, Okayama, Japan
Keywords: pattern matching, concordancer, browser-based pattern matcher, Prolog

Abstract

This study introduces a novel design for a pattern matching system capable of extracting select words or phrases from texts. In the process of learning a foreign language, searching for instances of usage or grammatical structures within texts is a common requirement. While numerous systems, particularly concordancers, have been proposed in prior research, many of them lacked flexibility and posed challenges when attempting to combine specific search patterns. To address this limitation, we developed a new phrase search system that allows users to craft their search patterns by merging basic search templates. This paper presents a system that leverages Prolog predicates as a fundamental data structure, utilizing SWI-Prolog for processing. The system is capable of performing searches that integrate regular expressions with other combined patterns. Our performance test demonstrates the system can process 10,000 sentences without errors. User evaluation employing system usability scale indicates that while the current usability of our system requires enhancement, the feedback gathered from these evaluations not only confirms the system’s robustness but also provides valuable insights for future improvements.

References

Adam Kilgarriff and Pavel Rychly and Pavel Smrz and David Tugwell. The Sketch Engine. In Proceedings of the Eleventh EURALEX, pages 105–115, 2004.

Adam Kilgarriff, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. The Sketch Engine: Ten Years On. Lexicography, 1:7–36, 2014.

Oliver Christ and Bruno M Schulze. The IMS Corpus Workbench: Corpus Query Processor (CQP) User’ s Manual, 1994.

Jakubíček, Miloš and Kilgarriff, Adam and McCarthy, Diana and Rychlý, Pavel. Fast Syntactic Searching in Very Large Corpora for Many Languages. In Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, pages 741–747, 2010.

Daisuke Kawahara and Sadao Kurohashi. A Fully-Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pages 176–183, 2006.

Taku Kudo and Yuji Matsumoto. Japanese Dependency Analysis using Cascaded Chunking. In The 6th Conference on Natural Language Learning 2002 (CoNLL-2002), 2002.

Tatsuya Katsura and Koichi Takeuchi. A platform for searching texts for desired expressions in a user-editable pattern matching environment for language learning. In Proceeding of 2023 14th IIAI International Congress on Advanced Applied In-formatics (IIAI-AAI), pages 146–149, 2023.

Masayuki Asahara, Yuji Matsumoto, and Toshio Morita. Demonstration of ChaKi.NET Beyond the Corpus Search System. In Proceedings of the 26th Inter-national Conference on Computational Linguistics: System Demonstrations, pages 49–53, 2016.

National Institute of Japanese Language and Linguistics. NINJAL Parsed Corpus of Modern Japanese, 2016.

Stephen Wright Horn, Iku Nagasaki, Alastair Butler, and Kei Yoshimoto. Annota-tion Manual for the NPCMJ. National Institute of Japanese Language and Linguis-tics, 2019.

Roger Levy and Galen Andrew. Tregex and Tsurgeon: Tools for Querying and Manipulating Tree Data Structures. In Proceedings of the fifth International Con-ference on Language Resources and Evaluation (LREC 2006), pages 2231–2234, 2006.

Kohsuke Yanai, Misa Sato, Toshihiko Yanase, Kenzo Kurotsuchi, Yuta Koreeda, and Yoshiki Niwa. StruAP: A Tool for Bundling Linguistic Trees through Struc-ture-based Abstract Pattern. In Proceedings of the 2017 EMNLP System Demon-strations, pages 31–36, 2017.

Jan Wielemaker, Tom Schrijvers, Markus Triska, and Torbjörn Lager. SWIProlog. Theory and Practice of Logic Programming, 12(1-2):67–96, 2012.

Koichi Takeuchi, Suguru Tsuchiyama, Masato Moriya, Yuuki Moriyasu, and Koichi Satoh. Verb Sense Disambiguation Based on Thesaurus of Predicate- Argu-ment Structure. In Proceedings of the International Conference on Knowledge En-gineering and Ontology Development, pages 208–213, 2011.

C. J. Fillmore. The Case for Case, pages 1–89. New York: Holt, Rinehart, and Winston, 1968.

John Brooke. USU – A Quick and Dirty Usability Scale. In Usability Evaluation in Industry, pages 189–194. Taylor and Francis, 1996.

Published
2026-02-11
Section
Technical Papers (Learning Technologies and Learning Environments)