Find Jobs
Hire Freelancers

Automated server-based web scraping application to develop

$250-750 USD

In Progress
Posted over 11 years ago

$250-750 USD

Paid on delivery
Developer needed to develop an "intelligent" server based automated web scraping application which can identify from a large list of website URLs (over 200k), business websites from non-business websites. (a business website is a website which belongs to a business providing services) The proposed way to do this is to 1) develop a server-based application which will have the following instructions: a) verify whether the URL corresponds to an active website b) browse the website and identify "intra site" links (internal links) c) determine whether the text of the link includes a particular keyword (from a pre-determined set of keywords - such as "about us", "services", "company", "clients"...) for example: www. website .com/[login to view URL] - this link will give a "positive" result since the word "services" appears in the link. (the word "services" would have been pre-determined by the user) 2) a web interface with the following user features: from the web interface, the user must be able to: - upload a list of URLs to scrape (up to 200k or more if possible) - add keyword/remove keyword - start the "mining" process, pause it, stop it, resume it A real-time count of URLs processed with count of active websites, positive results, negative results - needs to be displayed. - download the URL list of active websites, positive-identified websites and negative ones IMPORTANT NOTES: The application needs to be multi-threaded efficient for max processing speed PLEASE ONLY BID IF YOU ARE THE DEVELOPER. (NO AGENCIES PLEASE) PLEASE INDICATE IN PMB WHAT DEVELOPMENT LANGUAGE YOU INTEND TO USE Thanks for your bid
Project ID: 2505024

About the project

4 proposals
Remote project
Active 12 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
I would like to work on this project. Planning on using Ruby on Rails and MySQL for the web server and Nokogiri (very popular Ruby gem for web scrapping). I would use background jobs so the application is usable during the actual scrapping. Keeping record of previous run, so users can download the files with the sites list any time they want. Will save data in batches to database to provide stop/pause/resume functionalities.
$720 USD in 7 days
4.6 (2 reviews)
1.6
1.6
4 freelancers are bidding on average $693 USD for this job
User Avatar
Hi sir, please check PM, thx Kimi.
$750 USD in 6 days
5.0 (408 reviews)
7.7
7.7
User Avatar
I worked on many similar projects, I have big experience in data mining projects. I can finish this task in short time, with the best quality.
$750 USD in 15 days
4.9 (86 reviews)
7.3
7.3
User Avatar
Let us get this done for you
$550 USD in 10 days
4.9 (18 reviews)
3.3
3.3

About the client

Flag of UNITED KINGDOM
London, United Kingdom
4.9
120
Payment method verified
Member since Mar 4, 2009

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.