Multithreaded External Website Source Parser (RegEx Bot)

Title: Multithreaded External Website Source Parser (Regular Expressions)

Project Type: Programmed Software, Windows (x64) Application, (Including Source Code)

Language: C#.NET, C, OR VB.NET (Visual Basic 2010) Or Other Programming Language for Executable Application

Interface: Screenshots of the interface design are attached to the Project Resources, and should be followed similarly

Budget: $150 total, with (1) proposed 'milestone' payment of $75 for a completed single threaded version that is limited to saving only the first 100 results of each type of information collected, and only use a single regular expression input file. The first milestone payment will be made after the demonstration application is reviewed. The speed however, even in the first milestone demonstration application, must meet certain 'speed' standards and programmer must have the knowledge of how the speed/accuracy will inncrease/decrease in comparison to the full multithreaded version, given 25~ mbps internet connection and a 5000~ passmark score PC running Win x64 and 8GB RAM.

Project Summary: This project description is for a program with the purpose of parsing external website source codes. The user will import a list of regular expressions, one per line named [login to view URL] in the following format:

beginning text##!##ending text

Whatever text in the source code is between the 'beginning text' and 'ending text' (where '##!##' is in the input file) will be appended to the file [login to view URL]

The application will also read a second file with a list of URLs (one per line) named [login to view URL]

If multiple matches of the regular expression are found within the same page's source code, each one will be appended to the output file ([login to view URL])

Getting Started: This project is best suited for someone who has already developed this application in part or wholly, though it is quite straight forward to anyone familiar with the scraping, data mining, crawling, etc of websites.

Speed, accuracy, and scalability: This software will be run on an approximate 25~ mbps internet connection (mega BIT), and 5000~ passmark score cpu running Winx64 with 8GB of RAM. The acceptable accuracy requirement is 95%, meaning that for a list of 100 URLs where 100 regular expressions are available in the corresponding source, at least 95 (or better) should be found and appended to 95 lines in the [login to view URL] file. The software will make use of large flat files with several million entries in [login to view URL], so should not have any issues either reading large [login to view URL] and appending to largely growing [login to view URL] files.

The desired speed of the software, taking into account the 95% accuracy requirement as well as the internet and hardware specifications of its machine is approximately 1800 URLs/minute under typical web server speed conditions. The only difficulty in developing this software should be the treatment of slowly responding websites, unfound urls, and your discretion with how they are handled.

Please take a moment to review the attached project resources that contain screen shots with the recommended GUI (user interface) of the software. For any questions regarding the project, feel free to PM me any time (will check them often) and I can provide additional contact information or simply answer any inquiries you have there. Thank you and good luck.

Skills: PHP, Software Architecture, Visual Basic

See more: source website parser, web source format, websites that use the programming language php, website source code in php, website programming language, website programming codes, website of programming language, web scraping software free, web scraping part time, web programming source code, web developing language, web design codes free, visual programming website, visual programming language, visual basic net programming language, visual basic net programmer, visual basic for website, vb source code, use regular expressions, typical website

About the Employer:
( 29 reviews ) niles, United States

Project ID: #1488942

9 freelancers are bidding on average $152 for this job


I am ready to deliver the project

$180 USD in 5 days
(174 Reviews)

I can do this for you. See PM for details.

$150 USD in 3 days
(366 Reviews)

Expert si here, Lets do it.

$150 USD in 5 days
(66 Reviews)

Hello, thanks for invitation! Regards softwarevamp [[login to view URL]]

$150 USD in 3 days
(26 Reviews)

please check your pm.

$150 USD in 3 days
(20 Reviews)

Hi, I can do this. Plz check PMB. regards

$150 USD in 5 days
(6 Reviews)

Hello sir, I have already created similar software. Please reply if interested. Thank you.

$150 USD in 3 days
(3 Reviews)

Dear hiring manager! I have experience of web scraping and regex(automated application). I have completed number of projects. My past samples are attached in PM. I will provide you efficient solution. Please refer t More

$140 USD in 3 days
(1 Review)

Hi! I can make your project in 1 days, because i have similar application...

$150 USD in 1 day
(0 Reviews)