Detecting Transition of Research Themes using Time-oriented Attributes in Governmental Funding

  • Michiko Yasukawa Gunma University
  • Koichi Yamazaki Tokyo Denki University
Keywords: text mining, regression analysis, database, grant-in-aid for scientific research


We investigate a method for detecting yearly difference between new and old scientific research themes in grant applications. While open data for such analysis is available, there has not yet been sufficient study to fill in the gap between theory and practice of quantitative analysis of actual data. In our approach, binary document classification and regression analysis are combined to examine a large corpus of grant applications. From a theoretical viewpoint, we analyzed artificial corpora that emulates heterogeneity in the target text data. Then, we experimented on the real data of research themes in governmental funding in Japan to confirm the effectiveness of our approach. Our contribution in this study is represented by the notable findings as follows. (1) As research themes in competitive grants somewhat changed each year, newer themes gradually became dissimilar to old themes. (2) While the differences in a shorter span is generally smaller and different research areas have different tendencies in a longer span, the time-oriented tendency in research themes for 20 years were detectable and the differences between the baseline and our methods were statistically significant.


M. Yasukawa and K. Yamazaki, “Categorizing bibliographic data for detection of transition in academic subjects,” in 9th International Congress on Advanced Applied Informatics, IIAI-AAI 2020, Online Congress, September 1-15, 2020. IEEE, 2020, pp. 846–848.

N. Yamashita, M. Numao, and R. Ichise, “Predicting research trends identified by research histories via breakthrough researches,” IEICE TRANSACTIONS on Information and Systems, vol. 98, no. 2, pp. 355–362, 2015.

“Grants-in-Aid for Scientific Research – KAKENHI –,”

“KAKEN: Grants-in-Aid for Scientific Research Database (The National Institute of Informatics),”

P. R. Cohen and R. Kjeldsen, “Information retrieval by constrained spreading activation in semantic networks,” Information processing & management, vol. 23, no. 4, pp. 255–268, 1987.

K. Aagaard, P. Mongeon, I. Ramos-Vielba, and D. A. Thomas, “Getting to the bottom of research funding: Acknowledging the complexity of funding dynamics,” Plos one, vol. 16, no. 5, p. e0251488, 2021.

F. Munari and L. Toschi, “The impact of public funding on science valorisation: an analysis of the erc proof-of-concept programme,” Research Policy, vol. 50, no. 6, p. 104211, 2021.

M. Dzie˙zyc and P. Kazienko, “Effectiveness of research grants funded by european research council and polish national science centre,” Journal of Informetrics, vol. 16, no. 1, p. 101243, 2022.

T. S. Kuhn, The structure of scientific revolutions. Chicago University of Chicago Press, 1970, vol. 111.

H. Small, “Structural dynamics of scientific literature,” KO KNOWLEDGE ORGANIZATION, vol. 3, no. 2, pp. 67–74, 1976.

T. L. Griffiths and M. Steyvers, “Finding scientific topics,” Proceedings of the National academy of Sciences, vol. 101, no. suppl 1, pp. 5228–5235, 2004.

M. Krenn and A. Zeilinger, “Predicting research trends with semantic and neural networks with an application in quantum physics,” Proceedings of the National Academy of Sciences, vol. 117, no. 4, pp. 1910–1916, 2020.

S. Bird, E. Klein, and E. Loper, Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”, 2009.

B. Bengfort, R. Bilbro, and T. Ojeda, Applied text analysis with python: Enabling language-aware data products with machine learning. ” O’Reilly Media, Inc.”, 2018.

C. Manning, P. Raghavan, and H. Schutze, ¨ Introduction to information retrieval. Cambridge university press, 2008.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

F. Sebastiani, “Machine learning in automated text categorization,” ACM computing surveys (CSUR), vol. 34, no. 1, pp. 1–47, 2002.

N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992.

P. Langley, W. Iba, K. Thompson et al., “An analysis of bayesian classifiers,” in Proceedings of the tenth national conference on Artificial intelligence, 1992, pp. 223–228.

L. Breiman, “Arcing classifier (with discussion and a rejoinder by the author),” The Annals of Statistics, vol. 26, no. 3, pp. 801–849, 1998.

L. Bottou, “Stochastic gradient learning in neural networks,” in Proceedings of Neuro-Nˆımes 91, vol. 91, no. 8, 1991, p. 12.

B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the fifth annual workshop on Computational learning theory, 1992, pp. 144–152.

Technical Papers