Skip to Main Content


How Much Does ChatGPT Know?

In short, ChatGPT knows a lot—but not everything. It was trained using hundreds of gigabytes of writing on the Internet, but the Internet does not contain the whole world of knowledge. While it can provide extremely helpful explanations (imagine being able to ask Wikipedia virtually any question you want), it sometimes fabricates information as well based its understanding of what words usually follow others. As the official ChatGPT FAQ puts it, "ChatGPT will occasionally make up facts or 'hallucinate' outputs." These "hallucinations" can include fabricated citations.

As another example, we asked it for information about one of our librarians. It provided a response that sounded plausible but was largely false (incorrect information underlined in red):

ChatGPT response to the question "Who is Lynne Stahl? She has worked as a librarian at West Virginia University." Response says: Lynne Stahl is a librarian who has worked at West Virginia University (WVU) in Morgantown, West Virginia. She has held the position of Coordinator of Library Instruction and Information Literacy at WV Libraries since 2014. In this role, she is responsible for developing and delivering library instruction to students and faculty, as well as collaborating with other librarians and faculty across the university to integrate information literacy into the curriculum. Prior to joining WVU, Lynne Stahl worked at other academic libraries, including the University of Tennessee at Chattanooga and the University of California, Irvine. She holds a Master of Library Science degree from the University of North Carolina at Chapel Hill and a Bachelor of Arts degree from Davidson College. Her professional interests include information literacy, assessment, and teaching and learning in higher education." Much of this information is false. 

Subscription-based databases

When asked, ChatGPT currently claims that it does not have direct access to subscription-based databases such as JSTOR. The AI tool's training data depended on publicly available texts.