Michigan State University

Datasets for Digital Research

The datasets and data-finding tools listed below are not meant to be used as a source for reading material, but rather as data for text mining or other "non-consumptive" research, that is, research conducted by computational methods which does not reproduce significant portions of text for personal or public display.

In addition to the materials below prepared by the MSU Libraries, also be aware of additional corpora available for linguistic research.

Text and metadata for analysis can also be often be obtained via publishers or content vendors, either through direct negotiation, via API, or a web interface. JSTOR, for example, offers access to ngram word counts via their Data for Research portal.

Recommendations for acquisition of new datasets, requests for assistance gathering and preparing data, and questions about how to use data may be directed to the MSU Libraries Digital Scholarship Lab.


Hand casting ballot
Fannie Lou Hamer papers, 1966-1978
(open to MSU users)
Number of Works: 640 documents
Years covered: 1966-1978
Size: 14 MB
Sunday School Books in Nineteenth Century America  (open to non-MSU users)
Number of Works: 166 works
Years covered: 1809-1887
Size: 11.6 MB
The Grange Visitor 
(open to non-MSU users)
Number of Works: 429 issues
Years covered: 1875-1896
Size: 8.53 GB
Michigan Farming Journals 
(open to non-MSU users)
Number of Works: 1,954 issues
Years covered: 1878-1938
Size: ~60 GB
Feeding America 
(open to non-MSU users)
Number of Works: 76 books
Years covered: late 18th - early 20th century
Size: 78+ MB
MAC/MSC Record
M.A.C/M.S.C Record Dataset 
(open to non-MSU users)
Number of Works: 2694 works
Years covered: 1896-1955
Size: 24+ GB
Congress building
U.S. Congressional Collection 
(open to on-campus users)
Number of Works: 17,000+ daily records
Years covered: 1789-2006
Google Books Dataset
(open to on-campus users)
Number of Works: 3,000,000 approx.
Years Covered: 1500 - 2012
Size: 2.9 TB
Academic Building
MSU Libraries Catalog (in progress)
Number of Works:
Years Covered:

Image Credits: Schoolhouse by Chris Cole, Newspaper by John Caserta, Book by Derrick Snider, Library designed by libberry, Congress by Martha Ormiston, Cooking by Rafael Farias Leao, UX Personas by Matt Wasser, Vote by Re Jean Soo; Newspaper by Trishul; All via the Noun Project