A tool for scraping files from imageboards’ threads.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Alexander Andreev 2043fc277f No right to fuck up! Shit... Forgot third part of a version. 1 month ago
scrapthechan No right to fuck up! Shit... Forgot third part of a version. 1 month ago
.gitignore Initial commit with all the files. 3 months ago
CHANGELOG.md Added support for lolifox.cc. Fixed User-Agent usage, so it applied correctly everywhere now. 1 month ago
COPYING Initial commit with all the files. 3 months ago
Makefile No right to fuck up! Shit... Forgot third part of a version. 1 month ago
README.md Added support for lolifox.cc. Fixed User-Agent usage, so it applied correctly everywhere now. 1 month ago
setup.cfg Added support for lolifox.cc. Fixed User-Agent usage, so it applied correctly everywhere now. 1 month ago
setup.py Initial commit with all the files. 3 months ago

README.md

This is a tool for scraping files from imageboards’ threads.

It extracts the files from a JSON version of a thread. And then downloads ‘em in a specified output directory or if it isn’t specified then creates following directory hierarchy in a working directory:

<imageboard name>
|-<board name>
  |-<thread>
    |-[!op.txt]
    |-...
  |-...

Usage

scrapthechan [OPTIONS] (<url> | <imageboard> <board> <thread>)

<url> -- URL of a thread.

<imageboard> <board> <thread> -- imageboard name, board name and thread ID separately. E.g. 4chan b 1100500.

-o, --output-dir -- output directory where all files will be dumped to.

--no-op -- by default OP’s post will be saved in a !op.txt file. This flag disables this behaviour. I desided to put an ! in a name so this file will be on the top in a directory listing.

-v, --version prints the version of the program, and -h, --help prints help for a program.

Supported imageboards