As we continue to digitize the unique collections at Schaffer Library, we generate new datasets of machine-readable text and metadata available for computational analysis and other digital scholarship methods. We now have a Schaffer Library Collections as Data GitHub page where you can download and analyze our collections data.
Data from our collections include text files from optical character recognition (OCR) extracts, structured metadata files (e.g., in CSV or TSV format), and XML files. Collections currently available include the Concordiensis, the OD Putnum Photographs, and the Jonathan Pearson Diary.
Using Data from Library Materials
Many of our vendors provide downloads, visualization tools, and tools to support analyses within their interfaces. These include:
- Gale Literature Research Center -- contains a topic finder and term frequency to analyze trends in a given literary corpus.
- TDM Studio -- text and data mining for millions of historic newspaper articles.
- Readex AllSearch -- ngram and other analyses for text.