Cancelled

Multithreaded External Website Source Parser (RegEx Bot)

Title: Multithreaded External Website Source Parser (Regular Expressions)

Project Type: Programmed Software, Windows (x64) Application, (Including Source Code)

Language: C#.NET, C, OR VB.NET (Visual Basic 2010) Or Other Programming Language for Executable Application

Interface: Screenshots of the interface design are attached to the Project Resources, and should be followed similarly

Budget: $150 total, with (1) proposed 'milestone' payment of $75 for a completed single threaded version that is limited to saving only the first 100 results of each type of information collected, and only use a single regular expression input file. The first milestone payment will be made after the demonstration application is reviewed. The speed however, even in the first milestone demonstration application, must meet certain 'speed' standards and programmer must have the knowledge of how the speed/accuracy will inncrease/decrease in comparison to the full multithreaded version, given 25~ mbps internet connection and a 5000~ passmark score PC running Win x64 and 8GB RAM.

Project Summary: This project description is for a program with the purpose of parsing external website source codes. The user will import a list of regular expressions, one per line named [url removed, login to view] in the following format:

beginning text##!##ending text

Whatever text in the source code is between the 'beginning text' and 'ending text' (where '##!##' is in the input file) will be appended to the file [url removed, login to view]

The application will also read a second file with a list of URLs (one per line) named [url removed, login to view]

If multiple matches of the regular expression are found within the same page's source code, each one will be appended to the output file ([url removed, login to view])

Getting Started: This project is best suited for someone who has already developed this application in part or wholly, though it is quite straight forward to anyone familiar with the scraping, data mining, crawling, etc of websites.

Speed, accuracy, and scalability: This software will be run on an approximate 25~ mbps internet connection (mega BIT), and 5000~ passmark score cpu running Winx64 with 8GB of RAM. The acceptable accuracy requirement is 95%, meaning that for a list of 100 URLs where 100 regular expressions are available in the corresponding source, at least 95 (or better) should be found and appended to 95 lines in the [url removed, login to view] file. The software will make use of large flat files with several million entries in [url removed, login to view], so should not have any issues either reading large [url removed, login to view] and appending to largely growing [url removed, login to view] files.

The desired speed of the software, taking into account the 95% accuracy requirement as well as the internet and hardware specifications of its machine is approximately 1800 URLs/minute under typical web server speed conditions. The only difficulty in developing this software should be the treatment of slowly responding websites, unfound urls, and your discretion with how they are handled.

Please take a moment to review the attached project resources that contain screen shots with the recommended GUI (user interface) of the software. For any questions regarding the project, feel free to PM me any time (will check them often) and I can provide additional contact information or simply answer any inquiries you have there. Thank you and good luck.

Skills: PHP, Software Architecture, Visual Basic

See more: source website parser, web source format, websites that use the programming language php, website source code in php, website programming language, website programming codes, website of programming language, web scraping software free, web scraping part time, web programming source code, web developing language, web design codes free, visual programming website, visual programming language, visual basic net programming language, visual basic net programmer, visual basic for website, vb source code, use regular expressions, typical website

About the Employer:
( 28 reviews ) niles, United States

Project ID: #1488942

10 freelancers are bidding on average $150 for this job

srinichal

I am ready to deliver the project

$180 USD in 5 days
(110 Reviews)
7.1
gangabass

I can do this for you. See PM for details.

$150 USD in 3 days
(120 Reviews)
5.9
softwarevamp

Hello, thanks for invitation! Regards softwarevamp [[login to view URL]]

$150 USD in 3 days
(25 Reviews)
4.8
crypted

please check your pm.

$150 USD in 3 days
(19 Reviews)
4.8
Armref

Expert si here, Lets do it.

$150 USD in 5 days
(8 Reviews)
4.8
aoefmpes

pl check PM

$130 USD in 5 days
(18 Reviews)
4.3
rodelacidera

Hi, I can do this. Plz check PMB. regards

$150 USD in 5 days
(6 Reviews)
4.1
byteSector

Hello sir, I have already created similar software. Please reply if interested. Thank you.

$150 USD in 3 days
(3 Reviews)
3.8
samiullah67

Dear hiring manager! I have experience of web scraping and regex(automated application). I have completed number of projects. My past samples are attached in PM. I will provide you efficient solution. Please refer t More

$140 USD in 3 days
(1 Review)
2.4
rbtinf

Hi. I actually can port any of available opensource regex libraries like PCRE, TRE or POSIX Grep for a faster extraction or implement fastest possible DFA parser.

$120 USD in 3 days
(2 Reviews)
1.9
tanvl

Hi! I can make your project in 1 days, because i have similar application...

$150 USD in 1 day
(0 Reviews)
0.0