John Foley - Datasets

Dissertation Poetry Data

See CIIR/Downloads/Poetry for the datasets created as part of my disseration. This includes the largest publicly-available collection of poetry in the world as of May 2019; half a million pages with poetry on them from 50,000 scanned books. I plan to make larger datasets available (just a matter of CPU time now - send me an email if interested).

Wikipedia Year Facts

Entity Judgments for Robust and Clue12 Queries

