(Sub)Reddit Image Crawling Script

Closed Posted Oct 5, 2014 Paid on delivery
Closed Paid on delivery

I am looking for a Node.JS JavaScript script that I can run which can parse through a configurable number of posts in any of the categories (hot, new, rising, controversial, top, gilded, promoted) for a given subreddit. Once it has retrieved this list of posts on the given subreddit, it will then proceed to follow the links in the post and comments and will download images from the site while ignoring ads. For example, if a post links to Imgur, this script should be able to download all of the images of the Imgur album, or if a single picture is posted then it will download that picture. The directory that each image file is downloaded to should also be configurable so that the user can store all images in a desired location.

Acceptance Criteria:

1. The script will utilize a JSON file to allow for simple configuration of the subreddit, number of posts, category to be parsed, and image destination directory path.

2. The script will be modular. For example, different aspects such as the parsing of the subreddit, post/comments, and parsing of the target page will each be different functions.

3. The script must be able to download all images on the target page as well as any pages that are linked in the comments posted by the posts' original author.

4. The script will utilize popular, well-established node modules such as request, and async in order to ensure reliability and correctness. If you have questions regarding a node module, please ask.

5. The script can parse 10 posts and download all associated images in less than 15 seconds when utilizing a sustained download bandwidth connection of at least 15MB/s.

6. The script runs on NodeJS 10.X and utilizes the NPM repository for modules.

7. The script runs correctly and downloads all desired images when tested on any number of subreddits.

Please note that the Acceptance Criteria may change to remove or include additional requirements or may alleviate some constraints of the requirements.

JavaScript Node.js PHP Python Web Scraping

Project ID: #6548884

About the project

3 proposals Remote project Active Nov 11, 2014

3 freelancers are bidding on average $95 for this job

robindang

1. Run nodejs script built on request and async modules 2. Read in json with configuration 3. Download and output an array with paths to downloaded images

$111 USD in 1 day
(0 Reviews)
0.0
architm

A proposal has not yet been provided

$20 USD in 1 day
(0 Reviews)
0.0