-booruScraper/README.md

840 B

Minimalistic high-volume multi-threaded archive-tool for imagegallery sites.

Currently Supports:

Written because I needed a extremely large image database with tags to use for some experiments with training neural nets.

Requires:

  • Sqlalchemy
  • A Database of some sort (currently only works with postgres)
  • Beautiful Soup 4
  • Python 3

Potential ideas:

Note: There is a pre-available Danbooru-specific archive available in various formats here: https://www.gwern.net/Danbooru2019 This may be relevant if you're just looking for a easily-available dataset.