250K.txt
Download --->>> https://shurll.com/2tkIB8
In the Jupyter notebook 3-abstracts-export.ipynb, the per_year datasets are unpacked and merged, then two sets of files are created for 1) just abstracts and 2) just titles, with one title or abstract per line. This creates zipped files for all items (too large to upload on GitHub) and a random sample of 250k items, which can be found in processed_data/DUMP_DATE/arxiv-abstracts-250k.txt.zip and processed_data/DUMP_DATE/arxiv-titles-250k.txt.zip. 59ce067264