Autoencoder-Based Multi-Step Information Augmentation for Improving Multi-Layered Neural Networks
Abstract
The present paper aims to propose a new type of learning method for information augmentation by increasing the number of inputs or input dimensionality with multiple steps for improving supervised learning. One of the major problems of neural networks is that multi-layered neural networks, as a property of multi-layers as an in-formation channel, principally tend to lose any information content, for example, input patterns or error gradients. For overcoming the loss of information, unsupervised pretraining was proposed, giving initial weights for the supervised learning. However, the unsupervised pretraining to train multi-layered neural networks turned out to be not so effective as had been expected, because connection weights obtained by the unsupervised learning tend to lose their original characteristics immediately in supervised training. To keep original information by unsupervised learning, we try here to increase information in input patterns as much as possible to overcome the vanishing information problem. In particular, for acquiring detailed information more appropriately, we gradually increase detailed information through multiple steps. We applied the method to the real eyetracking data set, where the number of inputs was strictly restricted and the majority of inputs were highly correlated. When the present method of information augmentation was applied, it was confirmed that generalization performance could be improved. Then, we could interpret the importance of input variables more easily by treating all connection weights collectively. In addition, this interpretation of collective weights conformed to that of the findings by the conventional eye-tracking experiments.
References
S. J. Hanson and L. Y. Pratt, “Comparing biases for minimal network construction with backpropagation,” in Advances in neural information processing systems, pp. 177–185, 1989.
Y. LeCun, J. S. Denker, and S. A. Solla, “Optimal brain damage,” in Advances in neural information processing systems, pp. 598–605, 1990.
R. Andrews, J. Diederich, and A. B. Tickle, “Survey and critique of techniques for extracting rules from trained artificial neural networks,” Knowledge-based systems, vol. 8, no. 6, pp. 373–389, 1995.
J. M. Ben´ıtez, J. L. Castro, and I. Requena, “Are artificial neural networks black boxes?,” IEEE Transactions on Neural Networks, vol. 8, no. 5, pp. 1156–1164, 1997.
S. Srinivas and R. V. Babu, “Data-free parameter pruning for deep neural networks,” arXiv preprint arXiv:1507.06149, 2015.
C. E. Shannon, “A mathematical theory of communication,” ACM SIGMOBILE Mobile Computing and Communications Review, vol. 5, no. 1, pp. 3–55, 2001.
C. E. Shannon, “Prediction and entropy of printed english,” Bell system technical journal, vol. 30, no. 1, pp. 50–64, 1951.
N. Abramson, “Information theory and coding,” 1963. McGraw-Hill.
G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, et al., “Greedy layer-wise training of deep networks,” Advances in neural information processing systems, vol. 19, pp. 153–160, 2007.
X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks.,” in Aistats, vol. 15, p. 275, 2011.
J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks, vol. 61, pp. 85–117, 2015.
X. Zhang, H. Dou, T. Ju, J. Xu, and S. Zhang, “Fusing heterogeneous features from stacked sparse autoencoder for histopathological image analysis,” IEEE journal of biomedical and health informatics, vol. 20, no. 5, pp. 1377–1383, 2016.
J. Xu, L. Xiang, R. Hang, and J. Wu, “Stacked sparse autoencoder (ssae) based framework for nuclei patch classification on breast cancer histopathology,” in Biomedical Imaging (ISBI), 2014 IEEE 11th International Symposium on, pp. 999–1002, IEEE, 2014.
C. Tao, H. Pan, Y. Li, and Z. Zou, “Unsupervised spectral–spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification,” IEEE Geoscience and remote sensing letters, vol. 12, no. 12, pp. 2438–2442, 2015.
J. Deng, Z. Zhang, E. Marchi, and B. Schuller, “Sparse autoencoder-based feature transfer learning for speech emotion recognition,” in 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 511–516, IEEE, 2013.
B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete basis set: A strategy employed by v1?,” Vision research, vol. 37, no. 23, pp. 3311–3325, 1997.
H. Lee, C. Ekanadham, and A. Y. Ng, “Sparse deep belief net model for visual area v2,” in Advances in neural information processing systems, pp. 873–880, 2008.
R. Ash, Information Theory. John Wiley and Sons, 1965.
J. Wang and L. Perez, “The effectiveness of data augmentation in image classification using deep learning,” tech. rep., Technical report, 2017.
A. Asperti and C. Mastronardo, “The effectiveness of data augmentation for detection of gastrointestinal diseases from endoscopical images,” arXiv preprint arXiv:1712.03689, 2017.
Y. Xu, R. Jia, L. Mou, G. Li, Y. Chen, Y. Lu, and Z. Jin, “Improved relation classification by deep recurrent neural networks with data augmentation,” arXiv preprint arXiv:1601.03651, 2016.
M. Marchesi, “Megapixel size image creation using generative adversarial networks,” arXiv preprint arXiv:1706.00082, 2017.
D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
D. M. Allen, “The relationship between variable selection and data agumentation and a method for prediction,” Technometrics, vol. 16, no. 1, pp. 125–127, 1974.
R. May, G. Dandy, and H. Maier, “Review of input variable selection methods for artificial neural networks,” Artificial Neural Networks-Methodological Advances and Biomedical Applications, K. Suzuki, Ed. InTech, pp. 19–44, 2011.
R. Kamimura, “Direct potentiality assimilation for improving multi-layered neural networks,” in Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, pp. 19–23, 2017.
R. Kamimura, “Mutual information maximization for improving and interpreting multilayered neural network,” in Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI) (SSCI 2017), 2017.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.
L. Breiman, “Bagging predictors,” Machine learning, vol. 24, no. 2, pp. 123–140, 1996.
L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
K. Holmqvist, M. Nystrom, R. Andersson, R. Dewhurst, H. Jarodzka, and J. Van de Weijer, ¨ Eye tracking: A comprehensive guide to methods and measures. OUP Oxford, 2011.
L. N. van der Laan, I. T. Hooge, D. T. De Ridder, M. A. Viergever, and P. A. Smeets, “Do you like what you see? the role of first fixation and total fixation duration in consumer choice,” Food Quality and Preference, vol. 39, pp. 46–55, 2015.