Accounting exam results show students outperformed ChatGPT, the chatbot developed by OpenAI.
Although students performed better than ChatGPT in accounting exams, the researchers acknowledged the chatbot’s “impressive” performance. They deemed it a “game changer” that would positively impact teaching and learning methods.
In the journal Issues in Accounting Education, the findings of the study conducted by Brigham Young University (BYU) and 186 other universities were published, wherein the researchers sought to determine the performance of OpenAI’s technology in accounting exams.
During the accounting exam conducted by the researchers, students achieved an average score of 76.7%, whereas ChatGPT scored an average of 47.4%.
The study revealed that ChatGPT outperformed the student average in 11.3% of the questions, particularly in the areas of accounting information systems (AIS) and auditing. However, the AI chatbot’s performance was weaker in tax, financial, and managerial assessments compared to students.
According to the researchers, ChatGPT’s weaker performance in tax, financial, and managerial assessments might be attributed to its difficulty in handling the mathematical processes involved in these types of questions.
Furthermore, the study revealed that ChatGPT performed well in true/false questions (68.7% correct) and multiple-choice questions (59.5% correct) but had difficulty with short-answer questions (28.7% to 39.1% correct).
The researchers noted that ChatGPT struggled with answering higher-order questions in general, often providing incorrect answers but authoritative written descriptions or answering the same question in different ways.
In addition, the study observed that ChatGPT frequently gave explanations for its answers, even if they were incorrect. Furthermore, there were instances where the chatbot provided accurate descriptions but selected the incorrect multiple-choice answer.
The researchers pointed out a critical observation that ChatGPT generated made-up facts on occasion. For instance, while providing references, the AI chatbot generated realistic-looking references that were entirely fabricated, with non-existent authors and work.
Additionally, the AI chatbot made nonsensical mathematical errors, such as adding two numbers in a subtraction problem or dividing numbers incorrectly.
To contribute to the ongoing discussion about the role of AI models such as ChatGPT in education, David Wood, the lead author of the study and a professor of accounting at BYU, aimed to recruit as many professors as possible to evaluate how the AI chatbot performed compared to actual accounting students at the university level.
The co-author’s recruitment pitch on social media was a success, as 327 co-authors from 186 educational institutions in 14 countries participated in the research, contributing a total of 25,181 accounting exam questions used in the classroom.
In addition to the classroom exam questions, the researchers also enlisted undergraduate BYU students to provide another 2,268 textbook test bank questions covering AIS, auditing, financial accounting, managerial accounting, and tax.
The questions encompassed various types (true/false, multiple choice, short answer) and levels of difficulty.