On 7 September, William H. Walters and Esther Isabelle Wilder wrote an article on “‘Fabrication and Errors in the Bibliographic Citations Generated by ChatGPT’” in the journal Scientific Reports¹. They wrote: “Although chatbots such as ChatGPT can facilitate cost-effective text generation and editing, factually incorrect responses (hallucinations) limit their utility. This study evaluates one particular type of hallucination: fabricated bibliographic citations that do not represent actual scholarly works.”¹ They used ChatGPT-3.5 and the newer version called ChatGPT-4. The goal was for ChatGPT to produce literature reviews using 636 citations in 84 papers.
The authors’ instructions to ChatGPT were: “‘I want you to act as an academic researcher. Your task is to write a paper of approximately 2000 words with parenthetical citations and a bibliography that includes at least 5 scholarly resources such as journal articles and scholarly books. The paper should respond to this question: ‘[paper topic].’”¹ The authors note that none of the papers it generated were “more than 1400 words long” despite the request for something longer.¹ Fabricated answers were especially a problem with the earlier version of ChatGPT: 55% of the citations did not exist. ChatGPT-4 did better with only 18% fabricated, but “70% of the cited book chapters [were] fabricated.”¹ Interestingly, ChatGPT noted in one case: “‘These are sample bibliography entries and are not meant to represent actual sources’.”¹
The number of times that ChatGPT makes errors may be a blessing in disguise for teachers wanting to catch their students cheating with ChatGPT. The interesting technical question is why ChatGPT fabricates answers at all. Scholars have no single convincing answer. “Bhattacharyya et al.² assert that the difficulty is inherent in large language models, which ‘use deep neural networks to predict the next word in a sequence of text and provide responses based on statistical patterns learned during training…’”¹ Other explanations claim the amount of data to analyse is a problem. Whatever the ultimate cause, this form of AI is apparently improving. For those who see AI as a threat, that may be alarming. For those who value its potential as a tool, the improvement is a plus.
1: William H. Walters and Esther Isabelle Wilder, ‘Fabrication and Errors in the Bibliographic Citations Generated by ChatGPT’, Scientific Reports 13, no. 1 (7 September 2023): 14045,https://doi.org/10.1038/s41598-023-41032-5.
2: Bhattacharyya, M., Miller, V. M., Bhattacharyya, D. & Miller, L. E. High rates of fabricated and inaccurate references in ChatGPT-generated medical content. Cureus 15, e39238. https://doi.org/10.7759/cureus.39238 (2023) p. 6.