Automatic Dictionary Extraction and Content Analysis Associated with Human Values
Abstract
This paper studies a method for identifying word unigrams and word bigrams that are associated with one or more human values such as freedom or innovation. The key idea is to deterministically associate values with word choices, thus permitting values reflected by sentences to be assigned using dictionary lookup. This approach works nearly as well on average as the most accurate existing methods, but the principal contribution of the new method is that the basis for the system’s classification decisions are more easily interpreted by social scientists. The new method is based on using a Monte Carlo algorithm with simulated annealing to efficiently explore the space for optimal assignments of human values to unigrams and bigrams. Results are reported on an annotated test collection of prepared statements from witnesses at public hearings on the topic of net neutrality. The results include both accuracy comparisons with a previously reported approach and the use of emergent human coding to explain the classification process in a way that social scientists find to be useful as a way of characterizing the use of word pairs to express human values in this context
References
D.M. Blei, A.Y. Ng, and M.I. Jordan, “Latent Dirichlet Allocation,” J. Machine Learning Research, Vol. 3, No. 4-5, pp. 993-1022, 2003.
V. Braun and V. Clarke, “Using thematic analysis in psychology,” Qualitative Research in Psychology, Vol. 3, pp. 77-101, 2006.
A.-S. Cheng and K.R. Fleischmann, “Developing a meta-inventory of human values,” Proc. American Society for Information Science and Technology (ASIST2010), Vol. 47, No. 1, pp. 1-10, 2010.
A.-S. Cheng, K.R. Fleischmann, P. Wang, E. Ishita, and D.W. Oard, “The Role of Innovation and Wealth in the Net Neutrality Debate: A Content Analysis of Human Values in Congressional and FCC Hearings,” J. American Society for Information Science and Technology, Vol. 63, No. 7, pp. 1360-1373, 2012.
A.-S. Cheng, “Values in the Net neutrality debate: Applying content analysis to testimonies from public hearings,” Doctoral Thesis, Univ. of Maryland, College Park, 2012.
J. Cowie, J. Guthrie, and L. Guthrie, “Lexical Disambiguation using Simulated Annealing,” Proc. 14th Conf. Computational linguistics (COLING ’92), Vol. 1, pp. 359-365, 1992.
K.R. Fleischmann: “Information and Human Values,” Morgan & Claypool, 2014.
K.R. Fleischmann, A.-S. Cheng, T.C. Templeton, J.A. Koepfler, D.W. Oard, J. BoydGraber, E. Ishita, and W.A. Wallace, “Content Analysis for Values Elicitation,” Proc. ACM SIGCHI 2012 Conf. on Human Factors in Computing Systems, Workshop on Methods for Accounting for Values in Human-Centered Computing, Austin, TX, USA, 2012.
K.R. Fleischmann, Y. Takayama, A.-S. Cheng, Y. Tomiura, D.W. Oard, and E. Ishita, “Thematic Analysis of Words that Invoke Values in the Net Neutrality Debate,” Proc. iConference 2015, Newport Beach, CA, USA, 2015.
T.L. Griffiths and M. Steyvers, “Finding Scientific Topics,” Proc. National Academy of Sciences of the United States of America, Vol. 101 (Suppl. 1), pp. 5228-5235, 2004.
H.-F. Hsieh, and S. Shannon, “Three Approaches to Qualitative Content Analysis,” Qualitative Health Research, Vol. 15, No. 9, pp. 1277-1288, 2005.
E. Ishita, D.W. Oard, K.R. Fleischmann, A.-S. Cheng, and T.C. Templeton, “Investigating Multi-Label Classification for Human Values,” Proc. American Society for Information Science and Technology (ASIST2010), Vol. 47, No.1, pp. 1-4, 2010.
T. Joachims, “Learning to Classify Text using Support Vector Machines,” Springer Science+Business Media, New York, 2002.
S. Kirkpatrick, C. D. Gelatt, and M.P. Vecchi, “Optimization by Simulated Annealing,” Science, Vol. 220, pp. 671-680, 1983.
K. Krippendorff, “Content Analysis, an Introduction to Its Methodology”, 3rd ed., Thousand Oaks, CA: Sage Publications, 2013.
B. Liu, “Opinion Mining and Sentiment Analysis,” In Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Data-Centric Systems and Applications, pp. 459-526, Springer-Verlag Berlin Heidelberg, 2011.
N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller, “Equation of State Calculations by Fast Computing Machines,” J. Chemical Physics, Vol. 21 No. 6, pp. 1087-1092, 1953.
G. Sampson, “A Stochastic Approach to Parsing,” Proc. 11th Conf. Computational linguistics (COLING ’86), pp. 151-155, 1986.
S.H. Schwartz, “Value Orientations: Measurement, Antecedents, and Consequences across Nations,” In R. Jowell, C. Roberts, R. Fitzgerald, and G. Eva eds., Measuring attitudes cross-nationally: Lessons from the European Social Survey, London, England: Sage, 2007.
F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, Vol. 34, No. 1, pp. 1-47, 2002.
Y. Takayama, Y. Tomiura, E. Ishita, Z. Wang, D.W. Oard, K.R. Fleischmann, and A.-S. Cheng, “Improving Automatic Sentence-Level Annotation of Human Values Using Augmented Feature Vectors,” Proc. Conf. Pacific Association for Computational Linguistics (PACLING 2013), Tokyo, Japan, Sept. 2013.
Y. Takayama, Y. Tomiura, E. Ishita, D.W. Oard, K.R. Fleischmann, and A.-S. Cheng, “A Word-Scale Probabilistic Latent Variable Model for Detecting Human Values,” Proc. 23rd ACM Int’l Conf. Information and Knowledge Management (CIKM 2014), Shanghai, China, Nov. 2014.
Y.R. Tausczik, and J.W. Pennebaker, “The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods.” J. Language and Social Psychology, Vol. 29, No. 1, pp. 24-54, 2010.
E. Woodrum, “Mainstreaming Content Analysis in Social Science: Methodological Ad vantages, Obstacles, and Solutions,” Social Science Research, Vol. 13, pp. 1-19, 1984.