Tool for scraping ***booru image gallery sites.
Go to file
C W 71efc28722
Resolve https://github.com/fake-name/-booruScraper/issues/1
2020-05-28 02:55:20 -07:00
alembic Add image metadata to the file table, and fetch more then one db row per query, because the get_job() function query was somehow completely slamming my database. 2017-11-25 00:01:31 -08:00
scraper Better threading things. 2019-02-03 20:12:01 -08:00
util Fixing stuff. 2018-04-22 20:49:13 -07:00
.gitignore Move things about, update webrequest lib. 2017-11-20 21:02:57 -08:00
README.md Resolve https://github.com/fake-name/-booruScraper/issues/1 2020-05-28 02:55:20 -07:00
alembic.ini I think this should bring everything up to functional. 2017-11-22 23:08:27 -08:00
main.py Fixing stuff. 2018-04-22 20:49:13 -07:00
settings.py Fixing stuff. 2018-04-22 20:49:13 -07:00

README.md

Minimalistic high-volume multi-threaded archive-tool for imagegallery sites.

Currently Supports:

Written because I needed a extremely large image database with tags to use for some experiments with training neural nets.

Requires:

  • Sqlalchemy
  • A Database of some sort (currently only works with postgres)
  • Beautiful Soup 4
  • Python 3

Potential ideas:

Note: There is a pre-available Danbooru-specific archive available in various formats here: https://www.gwern.net/Danbooru2019 This may be relevant if you're just looking for a easily-available dataset.