Visual Explanation of Eigenvalues and Math Process in Latent Semantic Analysis
Abstract
Latent Semantic Analysis (LSA) is a widely used method in text mining field to extract the underlying concepts in the text document. The mathematical technique behind LSA is Singular Value Decomposition (SVD) in which the key concept is the eigenvalues. It is difficult to understand the underlying mathematics for general people, not proficient in mathematics. One reason might be that the linear algebra textbooks available in the market are not written for non– mathematics majors such as economics students. We believe that there is better teaching method to explain the eigenvalues and eigenvectors to our students. In this paper, we would like to illustrate the method. In the main part of the paper, we have proposed a visualization of the mathematical process behind LSA to make it easily understandable to general people, novice in mathematics. In addition, to understand the SVD process more deeply, another example which is a time series data analysis by SVD is also presented.
References
N. Evangelopoulos, and L. Visinescu, “Text-mining the voice of the people,” Communications of the ACM, vol. 55, no. 2, pp. 62-69, 2012.
T. K. Landauer, P. W. Foltz, and D. Laham, “An introduction to latent semantic analysis,” Discourse processes, vol. 25, no. 2-3, pp. 259-284, 1998.
T. K. Landauer, D. S. McNamara, S. Dennis, and W. Kintsch, Handbook of latent semantic analysis: Psychology Press, 2013.
C. D. Manning, P. Raghavan, and H. Schuetze, Introduction to Information Retrieval: Cambridge University Press, 2008.
D. A. Grossman, Information retrieval: Algorithms and heuristics: Springer, 2004.
T. H. Wonnacott, and R. J. Wonnacott, REGRESSION: John Wiley & Sons, Inc., 1981.
S. Konishi, Introduction to Multivariate Analysis: Linear and Nonlinear Modeling: Chapman & Hall/CRC, 2014.
I. Koch, Analysis of Multivariate and High-Dimensional Data: Cambridge University Press, 2013.
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, 2 ed.: Springer, 2009.
Y. Shirota, and T. Hashimoto, “Knowledge Visualization of Reasoning for Statistical Problems,” Annual Report of Gakushuin University Research Institute for Economics and Management (GEM Bulletin), vol. 28, pp. 45-54, 2014/12, 2014.
Y. Shirota, “Practical Teaching Methods of Linear Algebra for Students in the Economics Course,” Gakushuin Economics Papers, vol. 51, no. 2, pp. 133-147, 2014/07, 2014.
Y. Shirota, and T. Hashimoto, “Web Publication of Three-Dimensional Animation Materials for Business Mathematics : 10 Graphics for Economics Mathematics (Part 2),” Annual Report of Gakushuin University Research Institute for Economics and Management (GEM Bulletin), no. 26, pp. 13-22, 2012/12, 2012.
B. Mirkin, Core Concepts in Data Analysis: Summarization, Correlation and Visualization (Undergraduate Topics in Computer Science): Springer, 2011.
Wikipedia. "Singular Value Decomposition," 2015; http://en.wikipedia.org/wiki/Singular_value_decomposition.
S. Lipschutz, Theory and Problems of Beginning Linear Algebra: McGraw-Hill, 1997.
B. Kolman, and D. R. Hill, Introductory Linear Algebra, 8 ed.: Pearson, 2005.
W. K. Nicholson, Linear Algebra With Applications, 6 ed.: McGraw-Hill, 2003.
V. Plerou et al., “Random matrix approach to cross correlations in financila data,” Physical Review E, Vol. 65, No. 6, pp. 066126-1-066126-18, 2002.
V. Plerou et al., “A random matrix theory approach to fnancial cross-correlations,” Physica A: Statistical Mechanics and its Applications, Vol. 287, No. 34, pp. 374-382, 2000.
M.F. Lubis, Y. Shirota, and R.F. Sari, “Thailand's 2011 Flooding: its Impacts on Japan Companies in Stock Price Data,” Gakushuin Economics Papers, Vol. 52, No. 3, pp. 101-121, 2015.