Detection of Unnatural Parts of Statistical Data
Abstract
Ensuring the authenticity of statistical data is important because such data are used for various decision-making tasks. However, in practical applications, several types of data alterations have been reported. Therefore, it is necessary to validate the accuracy of statistical data. Benford’s law is a well-known method for detecting unnatural numerical data. According to Benford’s law, the occurrence probability of the first significant digits follows a particular distribution. However, the unnatural parts of data cannot be accurately identified. In this study, we attempted to identify the unnatural parts of statistical data available in tabular format. A subset of the target data was specified using the row and column names that define each cell in the table or the words displayed in the table title. By measuring the divergence of the subsets, we identified the unnatural subsets. In this paper, we present the results of the identification of unnatural subsets using the agricultural data acquired from the China Statistical Yearbook.
References
Nigrini, M. J., “Benford’s Law Applications for Forensic Accounting, Auditing, and Fraud Detection,” ISBN: 9781118152850, Wiley, 2012
National Bureau of Statistics of China, “China Statistical Yearbook,” http://www.stats.gov.cn/tjsj/ndsj/2014/indexeh.htm, (accessed Jan. 2017)
Nihon Keizai Shimbun, “China’s statistics, dubious about reliability,” http://www.nikkei.com/article/DGXLASGM19H73 Z11C15A0EA2000/, 2015 (accessed Feb. 2018)
Sankei News: “Liaoning Province, China. Accept false statistics. Fiscal revenue inflated in the past.” http://www.sankei.com/world/news/170118/wor170118/0007-n1.html, 2017.1.18 00:40, (accessed Feb. 2018)
Badkar, M., Benford’s Law Rises New Doubts About Chinese Economic Data, www.businessinsider.com/benfords-law-questions-chinese-data-2013-1, BUSINESS INSIDER, Jan 11, 2013, (accessed Feb. 2018)
Japan Science & Technology Agency, “Science Portal China,” https://www.spc.jst.go.jp/statistics/statictisc index.html, 2017, (accessed Oct. 2017)
Fraud analysis with SSAS: Benford’s law test in OLAP Cubes, www.metricabi.de/fraud-analysis-with-ssas-benfords-law-test-in-olap-cubes/, Microsoft, Jun 19, 2015
Leemis, L. M., Schmeiser, B. W, Evans, D. L., Survival Distributions Satisfying Benford’s Law, The American Statistician, 54:4, pp. 236–241, 2000
Cho, W., K., T., Gains, B.J, Breaking the (Benford) Law, The American Statistician, 61:3, pp.218–223, 2007.
Morrow, J., Benford’s Law, Families of Distributions and a test basis, CEPDP1291, LSE Research Online, http://eprints.lse.ac.uk/60364/ 2010 (accessed Feb. 2018)
Simon Newcomb, “Note on the frequency of use of the different digits in natural numbers,” American Journal of Mathematics 4 (1/4), pp.39-–40, doi:10.2307/2369148, 1881.
Benford, F., “The law of anomalous numbers,” Proc. of the American Philosophical Society, 78:4, pp.551–572, Mar. 1938.
Nigrini, M., J., I’ve Got Your Number, Journal of Accountancy, May 1, 1999
Holz, C., A., The quality of China’s GDP statistics, In China Economic Review, Volume 30, 2014, pp 309–338, ISSN 1043-951X
Ichinomiya, S., Experimental verification on application of digital analysis, J-STAGE, 2011:21, pp.103–111, 2017
Andreas, D., Not the First Digit! Using Benford’s Law to Detect Fraudulent Scientific Data, Journal of Applied Statistics, 34:3, pp.321–329, 2007
Arshadi1, L., Jahang, A., H., Benford’s law behavior of Internet traffic, Journal of Network and Computer Applications, Vol. 40, pp.194–205, 2014
CaseWare Analytics, https://www.casewareanalytics.com/blog/idea-tech-tip-usingbenford ’s-law
Sarker, P., B., An Observation on the Significant Digits of Binominal Coefficients and Factorials, Sankhya B.35, pp.363–364, 1973.
Maurus, S., Plant, C., Let’s See Your Digits: Anomalousn-State Detection using Benford’s Law, KDD 2017 Research Paper, Aug. 2017