How Does the Persona Given to Large Language Models Affect the Idea Evaluations?

Authors

  • Hiroaki FURUKAWA The University of Kitakyushu

DOI:

https://doi.org/10.52731/liir.v006.342

Keywords:

Creativity, Idea evaluation, Large Language Models, Persona, Prompt engineering

Abstract

This paper investigates the effect of personas in Large Language Models (LLMs) on idea evaluation. The language comprehension ability of LLMs has recently reached a level comparable to that of humans. Consequently, LLMs are being explored for their potential application in idea evaluation. However, LLMs face several challenges in their outputs, including hallucinations and biases. To address these issues, prompt engineering is utilized to guide LLMs toward producing desired results. This study focuses on Persona as a factor in prompt engineering for LLMs. Personas enable the reproduction and control of specific personalities within LLMs. The objective of this study is to validate the relationship between personas and idea evaluation using GPT-4. The results suggest that variations in personas influence the evaluation of ideas. Furthermore, a relationship was observed between evaluation scores and the evaluation criteria deemed important by the LLM.

References

S. Farquhar, J. Kossen, L. Kuhn, and Y. Gal, “Detecting hallucinations in large language models using semantic entropy,” Nature, vol. 630, no. 8017, pp. 625–630, 2024.

J. Shin, H. Song, H. Lee, S. Jeong, and J. C. Park, “Ask llms directly,” what shapes your bias?”: Measuring social bias in large language models,” arXiv preprint arXiv:2406.04064, 2024.

G. Serapio-Garcia, M. Safdari, C. Crepy, L. Sun, S. Fitz, P. Romero, M. Abdulhai, A. Faust, and M. Mataric, “Personality traits in large language models,” 2023.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017.

OpenAI, “Gpt-4 - openai.” https://openai.com/index/gpt-4/ (Accessed on 10/10/2024).

Google, “Gemini - chat to supercharge your ideas.” https://gemini.google.com/ (Accessed on 10/10/2024).

Meta, “Llama 3.1.” https://www.llama.com/ (Accessed on 10/10/2024).

Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang, et al., “A survey on evaluation of large language models,” ACM Transactions on Intelligent Systems and Technology, vol. 15, no. 3, pp. 1–45, 2024.

D. E. O’Leary, “A comparison of numeric assessments of ideas from two large language models: With implications for validating and choosing llms,” IEEE Intelligent Systems, vol. 39, no. 3, pp. 73–76, 2024.

OpenAI, “Hello gpt-4o.” https://openai.com/index/hello-gpt-4o/ (Accessed on 10/10/2024).

J. Diedrich, M. Benedek, E. Jauk, and A. C. Neubauer, “Are creative ideas novel and useful?,” Psychology of aesthetics, creativity, and the arts, vol. 9, no. 1, p. 35, 2015.

DeepL, “Deepl translate: The world’s most accurate translator.” https://www.deepl.com/en/translator (Accessed on 10/10/2024).

OpenAI, “Api reference : Create chat completion.” https://platform.openai.com/docs/api-reference/chat/create (Accessed on 10/10/2024).

J. O.Wobbrock, L. Findlater, D. Gergle, and J. J. Higgins, “The aligned rank transform for nonparametric factorial analyses using only anova procedures,” in Proceedings of the SIGCHI conference on human factors in computing systems, pp. 143–146, 2011.

L. A. Elkin, M. Kay, J. J. Higgins, and J. O. Wobbrock, “An aligned rank transform procedure for multifactor contrast tests,” in The 34th annual ACM symposium on user interface software and technology, pp. 754–768, 2021.

Downloads

Published

2025-02-22