Michigan State University
 X 

We are excited to announce that we have moved to a new catalog. Please report any problems you may experience to Discovery Services by calling (517) 353-8700 or by filling out our feedback form.

Note: Borrowing from other Michigan libraries through MeLCat continues to be suspended until further notice.

To learn more information, please see our FAQ page.

Google Books Data Set

The Google Books Dataset subsetting page.

The Google Books Dataset (GDS) is a collection of scanned books, totaling approximately 3 million volumes of text, or 2.9 terabytes (2,970 gigabytes) of data in its zipped form. The books included in the dataset are public domain works digitized by Google and made available by the Hathi Trust Digital Library.

The subset generator provides a means of accessing these texts. On-campus users are permitted to search, compile collections, and download full text and metadata files. (Users are not permitted to in any way reproduce the downloaded data.) It is possible to access the collection in its entirety directly, however the way the data is organized is not well-suited to browsing (paths to texts are based on unique identifiers, not author name or title), and search is not available. The subset generator was created (using the Python web framework Django in coordination with a MySQL database and a Solr index) to allow users to built their own sets of materials based on their own particular research interests. 

Access Text