Data mining refers to a wide swath of techniques for finding and analyzing patterns or insights from large data sets. Techniques vary widely based on the type of data being analyzed and the research objectives. The process of data mining can include both the extraction of the data itself from some source, and the extraction of patterns from the data once it has been obtained and processed. Data mining techniques are grounded in computer science and statistics, and usually utilize a programming language (e.g., Python, R, or SAS), but there are also alternative interfaces to sidestep coding.
Text mining (or text data mining/TDM) is a subdomain of data mining which deals specifically with textual data. Textual data is any data which consists of (unstructured) text: books, articles, social media posts, etc. A text data set might focus on just one category of items, or might be incredibly broad, grabbing anything and everything available that contains text.
TDM provides the ability to quickly obtain and process huge amounts of information. It allows researchers to be thorough and comprehensive in a way that might otherwise be impossible. You should consider TDM techniques if you have access to a large amount of text which you want to compare, analyze or ask questions of.
Some fields where research methods might include TDM: statistics, sociology, psychology, economics, environmental science, and many more.
A few examples of studies utilizing TDM:
Text Mining Oral Histories in Historical Archaeology
Topic-based content and sentiment analysis of Ebola virus on Twitter and in the news
The first step on any TDM project is finding and acquiring data. The data that is available to you will shape the rest of your project and inform the research questions that you can ask as well as the techniques and tools that you will choose to use for your analysis. This guide will introduce some of the data sources available to you, both openly from the internet, and through the library's subscriptions. You can also always ask a librarian for help to get started with your project. We are available to help you find, understand, and use data for your TDM projects.
Note that automated web-scraping of library databases is typically prohibited by our licenses and you will need to use the official channels provided to us, see more here.
Questions about finding TDM data sources? Contact James Guerrera-Sapone, Research Data Librarian
Questions about library database TDM permissions? Contact Denis Shannon, Electronic and Continuing Resources Librarian