Zero-Shot Text Classification Using Large Language Models for Key Audit Matters in Japanese Audit Reports

Nobushige Doi; Yusuke Nobuta; Takeshi Mizuno

doi:10.52731/ijscai.v9.i1.865

Nobushige Doi Japan Exchange Group, Inc.
Yusuke Nobuta Tokyo Stock Exchange, Inc.
Takeshi Mizuno Japan Exchange Group, Inc.

DOI: https://doi.org/10.52731/ijscai.v9.i1.865

Keywords: Auditing, Financial disclosure, Key Audit Matters, Text classification, Large Language Models, ChatGPT

Abstract

Japanese-listed companies are required to submit audit reports to the Prime Minister of Japan. In principle, these reports must include “Key Audit Matters” (KAMs), which are matters that the auditors, as professional experts, have judged as particularly important when auditing financial statements. A previous study proposed an automatic classification method called zero-shot text classification for KAMs. We examine whether zero-shot text classification with large language models (LLMs) such as ChatGPT can automatically classify KAMs. We also examine how the following three approaches contribute to the accuracy of zero-shot text classification by LLMs: definition refinement, majority decision-making based on LLM outputs, and use of state-of-the- art models. The experimental results confirm that definition refinement and majority decision- making based on more than three results are useful to some extent. Furthermore, the latest ChatGPT model, gpt-4-1106-preview of the Generative Pre-trained Transformer 4 (GPT-4) model, achieved a classification accuracy of up to 87.2%.

References

[1] International Auditing and Assurance Standards Board. Handbook of International Qual- ity Control, Auditing, Review, Other Assurance, and Related Services Pronouncements, volume 1. International Federation of Accountants, 2015.
[2] Public Company Accounting Oversight Board. As 3101: The auditor’s report on an au- dit of financial statements when the auditor expresses an unqualified opinion. 2017. https://pcaobus.org/oversight/standards/auditing-standards/details/AS3101 (accessed on January 31 2024).
[3] The Financial Services Agency. Release of “opinion on the revision of auditing stand- ards” (in Japanese). 2018. https://www.fsa.go.jp/news/30/sonota/20180706.html (ac- cessed on January 31 2024).
[4] N Doi, Y Nobuta, and T Mizuno. Topic classification of key audit matters in Japanese audit reports by zero-shot text classification. In 2023 14th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), pages 540–545, Los Alamitos, CA, USA, July 2023. IEEE Computer Society.
[5] Audit Analytics, Inc. Database catalog. https://www.auditanalytics.com/product-catalog (accessed on January 31, 2024).
[6] Qian Huang. Do Critical Audit Matter Disclosures Impact Investor Behavior? PhD the- sis. Columbia University, 2021.
[7] Dan Lynch, Aaron Mandell, and Linette M Rousseau. The determinants and unintended consequences of expanded audit reporting: Evidence from tax-related key audit matters. Available at SSRN 3689349, 2021.
[8] The Japanese Institute of Certified Public Accountants. Case analysis report for the first year of mandatory application of key audit matters (year ended March 31, 2021) (in Jap- anese). 2021. https://jicpa.or.jp/specialized_field/20211029fgf.html (accessed on Janu- ary 31, 2024).
[9] The Japanese Institute of Certified Public Accountants. Case analysis report of key au- dit matters (April 2021–March 2022) (in Japanese). 2022. https://jicpa.or.jp/special- ized_field/20221226cgi.html (accessed on January 31, 2024).
[10] Carlos Francisco Moreno-Garc ́ıa, Chrisina Jayne, Eyad Elyan, and Magaly Aceves- Martins. Abstract screening for systematic reviews using machine learning and zero- shot classification. Available at SSRN 4210704.
[11] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
[12] Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, and Michael Zeng. Want to reduce labeling cost? GPT-3 can help. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4195–4205, Punta Cana, Dominican Republic, Novem- ber 2021. Association for Computational Linguistics.
[13] Taja Kuzman, Igor Mozeticˇ, and Nikola Ljubesˇic ́. Chatgpt: Beginning of an end of manual linguistic data annotation? use case of automatic genre identification, 2023.
[14] The Financial Services Agency. Key audit matters (KAMs) characteristic examples and key points for description (in Japanese). 2022. https://www.fsa.go.jp/news/r3/sonota/20220304-2/01.pdf (accessed on January 31, 2024).
[15] OpenAI. Introducing ChatGPT. https://openai.com/blog/chatgpt (accessed on January 31, 2024).
[16] OpenAI. Gpt-4 technical report. 2023. https://cdn.openai.com/papers/gpt-4.pdf (ac- cessed on January 31 2024).